Everything you need for production-grade distributed storage, built from the ground up with erasure coding at its core.
Built on Reed-Solomon erasure coding, KDSS splits data into configurable data and parity shards. With 4 preset configurations ranging from EC 5+2 to EC 29+3, you can achieve up to 90.6% space efficiency while tolerating simultaneous disk failures -- far surpassing the 33.3% efficiency of traditional 3-way replication.
# Erasure Coding Configuration
[ec]
data_shards = 5 # Number of data shards / 数据分片数
parity_shards = 2 # Number of parity shards / 校验分片数
# Space Efficiency Comparison:
# EC 5+2 = 71.4% (tolerates 2 failures)
# EC 9+2 = 81.8% (tolerates 2 failures)
# EC 18+3 = 85.7% (tolerates 3 failures)
# EC 29+3 = 90.6% (tolerates 3 failures)
# 3-replica = 33.3% (tolerates 2 failures)
KDSS bypasses the filesystem entirely, performing direct writes on raw block devices. An append-only write pattern maximizes sequential throughput, while a dual SuperBlock design and five-level recovery mechanism ensure data integrity even after unexpected power loss or hardware failure.
[SuperBlock 4KB: primary + backup]
[Record 1: Header(64B) + Shard Data + Padding]
[Record 2: Header(64B) + Shard Data + Padding]
[...]
Recovery Levels:
L0: Clean shutdown marker (skip recovery)
L1: SuperBlock validation & failover
L2: BadgerDB index recovery (recoverIndex)
L3: Single-shard EC rebuild
L4: Full stripe EC reconstruction
KDSS provides a fully S3-compatible HTTP API, allowing seamless integration with existing tools, SDKs, and applications. Supports bucket operations, multipart uploads for large files, presigned URLs for secure temporary access, and streaming transfers for efficient data movement.
# Upload a file via S3 API
curl -X PUT "http://s3.example.com/my-bucket/photo.jpg" \
-H "Authorization: AWS4-HMAC-SHA256 ..." \
-T ./photo.jpg
# Multipart upload (large files)
curl -X POST \
"http://s3.example.com/my-bucket/bigfile?uploads"
# Generate presigned download URL
kdss-cli presign get \
--bucket my-bucket \
--key photo.jpg \
--expires 3600
# List objects with prefix
curl "http://s3.example.com/my-bucket?prefix=photos/&max-keys=100"
Mount your KDSS cluster as a local directory via FUSE. Applications can use standard POSIX file operations -- read, write, stat, readdir -- while KDSS transparently handles erasure coding, shard distribution, and fault recovery behind the scenes. Configurable chunk size with LRU read caching and sequential prefetch optimize throughput for different workload patterns.
# Mount KDSS as a local filesystem
ksfs -c /etc/ksfs/mount.toml
# Configuration file (mount.toml)
[cluster]
masters = ["master-1:6700", "master-2:6700", "master-3:6700"]
[auth]
access_key = "your-access-key"
secret_key = "your-secret-key"
bucket = "my-bucket"
[mount]
mountpoint = "/mnt/kdss"
# Now use it like any local directory
ls /mnt/kdss/
cp /var/log/app.log /mnt/kdss/logs/
df -h /mnt/kdss
KDSS continuously monitors disk health via S.M.A.R.T. metrics and heartbeat signals. When a fault is detected -- whether a slow disk, read error, or complete disk failure -- the system automatically isolates the affected disk, reconstructs missing shards using EC parity data, and redistributes them to healthy nodes, all without manual intervention.
Disk Health Monitor
|
v
+------------------+
| S.M.A.R.T. Check |---> Normal ---> Continue
| Heartbeat Check | Monitoring
+------------------+
|
Fault Detected
|
v
+------------------+
| Isolate Disk | Mark disk as "offline"
| Stop I/O | Redirect traffic
+------------------+
|
v
+------------------+
| Identify Missing | Scan stripe metadata
| Shards | for affected data
+------------------+
|
v
+------------------+
| EC Reconstruct | Rebuild from parity
| Missing Shards | shards (k of n)
+------------------+
|
v
+------------------+
| Place on Healthy | Rebalance across
| Nodes | available disks
+------------------+
|
v
Repair Complete ---> Resume Normal Operation
As your cluster grows and workloads shift, disk utilization can become uneven. KDSS includes a utilization-aware data balancing engine that non-disruptively migrates shards from over-utilized to under-utilized disks, keeping your cluster healthy and storage evenly distributed with real-time progress monitoring.
# Check cluster disk utilization
kdss-cli balance status
Disk Utilization:
node-01/sda ████████████░░░ 82%
node-01/sdb █████████░░░░░░ 61%
node-02/sda ██████████████░ 93%
node-02/sdb ███████░░░░░░░░ 47%
node-03/sda ██████████░░░░░ 68%
Imbalance: 46% (threshold: 20%)
# Start balancing
kdss-cli balance start --max-bandwidth 100MB/s
Balancing in progress...
Migrated: 128 shards (12.4 GB)
Remaining: 64 shards (~6.2 GB)
ETA: 3m 22s
A full-featured web-based management console gives you complete visibility and control over your KDSS cluster. Monitor cluster health at a glance, manage nodes and disks, configure EC policies, and control user access -- all through an intuitive dashboard with role-based access control (RBAC).
+-----------------------------------------------+
| KDSS Console [admin] [Logout] |
+-----------------------------------------------+
| | |
| Dash | Cluster Health: Healthy |
| Nodes | |
| Disks | Nodes: 5/5 online |
| Buckets | Disks: 20/20 active |
| Users | Capacity: 42.8 TB / 60 TB (71%) |
| Alerts | |
| Config | Throughput IOPS |
| | ~~~^~~~ ~~^~~~ |
| | 348 MB/s 12.4K |
| | |
| | Recent Alerts (3) |
| | ! Disk node-02/sdc S.M.A.R.T. warn |
| | ! Repair job #47 completed |
| | i Balance job started |
+-----------------------------------------------+
KDSS exposes comprehensive Prometheus metrics out of the box, covering cluster health, disk I/O, EC operations, and API latencies. With 32 built-in alert rules, pre-configured Grafana dashboards, and Lark (Feishu) webhook integration, you get full observability without building monitoring infrastructure from scratch.
# Prometheus scrape config for KDSS
scrape_configs:
- job_name: 'kdss-master'
static_configs:
- targets: ['master-01:6701']
metrics_path: /metrics
- job_name: 'kdss-storage'
static_configs:
- targets:
- 'storage-01:6801'
- 'storage-02:6801'
- 'storage-03:6801'
# Sample alert rules (32 built-in)
# - DiskSpaceCritical (>90%)
# - DiskSmartWarning
# - NodeHeartbeatLost (>30s)
# - ECRepairQueueHigh (>100)
# - APILatencyP99High (>500ms)
# - ReplicationLagHigh
# ...
KDSS implements a safe two-phase delete process with a configurable recycle bin retention period. Automatic garbage collection runs in the background to reclaim space from deleted objects and stale temporary data, while ensuring no data is permanently removed before the retention window expires.
# Garbage Collection Configuration
[gc]
interval_sec = 3600 # Housekeeping scan interval (seconds)
auto_gc_pending_hours = 48 # Auto-delete gc_pending after N hours (0=disable)
stale_writing_minutes = 120 # Clean up stale writing stripes after N minutes
# Capacity Alerts
[capacity]
alert_pct = 95 # Alert when cluster usage >= 95%
reject_pct = 99 # Reject writes when >= 99%
# Background Compaction (storage.toml)
[compactor]
enabled = true
threshold = 0.2 # Soft threshold: compact when idle (20%)
force_threshold = 0.5 # Force compact regardless of load (50%)