Product Features

Core Engine

Erasure Coding Engine

Built on Reed-Solomon erasure coding, KDSS splits data into configurable data and parity shards. With 4 preset configurations ranging from EC 5+2 to EC 29+3, you can achieve up to 90.6% space efficiency while tolerating simultaneous disk failures -- far surpassing the 33.3% efficiency of traditional 3-way replication.

Reed-Solomon algorithm with hardware-accelerated Galois field operations
4 preset configurations: EC 5+2, 9+2, 18+3, 29+3
Up to 90.6% space efficiency vs 33.3% for 3-replica
Tolerates up to 3 simultaneous disk failures per stripe
Automatic shard distribution across nodes for fault isolation

master.toml

# Erasure Coding Configuration
[ec]
data_shards   = 5      # Number of data shards / 数据分片数
parity_shards = 2      # Number of parity shards / 校验分片数

# Space Efficiency Comparison:
# EC 5+2   = 71.4%  (tolerates 2 failures)
# EC 9+2   = 81.8%  (tolerates 2 failures)
# EC 18+3  = 85.7%  (tolerates 3 failures)
# EC 29+3  = 90.6%  (tolerates 3 failures)
# 3-replica = 33.3%  (tolerates 2 failures)

Storage Engine

Raw Disk Engine

KDSS bypasses the filesystem entirely, performing direct writes on raw block devices. An append-only write pattern maximizes sequential throughput, while a dual SuperBlock design and five-level recovery mechanism ensure data integrity even after unexpected power loss or hardware failure.

Direct raw disk writes bypass filesystem overhead
Append-only writes eliminate random I/O and fragmentation, significantly extending HDD lifespan and lowering failure rates
Dual SuperBlock with CRC32 checksum for metadata protection
Five-level crash recovery with clean shutdown optimization
Space reclamation via compaction without service interruption

Disk Layout

[SuperBlock 4KB: primary + backup]
[Record 1: Header(64B) + Shard Data + Padding]
[Record 2: Header(64B) + Shard Data + Padding]
[...]

Recovery Levels:
  L0: Clean shutdown marker (skip recovery)
  L1: SuperBlock validation & failover
  L2: BadgerDB index recovery (recoverIndex)
  L3: Single-shard EC rebuild
  L4: Full stripe EC reconstruction

Access Layer

S3 Compatible API

KDSS provides a fully S3-compatible HTTP API, allowing seamless integration with existing tools, SDKs, and applications. Supports bucket operations, multipart uploads for large files, presigned URLs for secure temporary access, and streaming transfers for efficient data movement.

Full bucket CRUD: CreateBucket, ListBuckets, DeleteBucket, HeadBucket
Object operations: PutObject, GetObject, DeleteObject, HeadObject, CopyObject
Multipart upload: InitiateMultipartUpload, UploadPart, CompleteMultipartUpload
Presigned URLs for time-limited authenticated access
Compatible with AWS CLI, boto3, MinIO Client, and standard S3 SDKs

bash

# Upload a file via S3 API
curl -X PUT "http://s3.example.com/my-bucket/photo.jpg" \
  -H "Authorization: AWS4-HMAC-SHA256 ..." \
  -T ./photo.jpg

# Multipart upload (large files)
curl -X POST \
  "http://s3.example.com/my-bucket/bigfile?uploads"

# Generate presigned download URL
kdss-cli presign get \
  --bucket my-bucket \
  --key photo.jpg \
  --expires 3600

# List objects with prefix
curl "http://s3.example.com/my-bucket?prefix=photos/&max-keys=100"

POSIX Interface

FUSE Filesystem Mount

Mount your KDSS cluster as a local directory via FUSE. Applications can use standard POSIX file operations -- read, write, stat, readdir -- while KDSS transparently handles erasure coding, shard distribution, and fault recovery behind the scenes. Configurable chunk size with LRU read caching and sequential prefetch optimize throughput for different workload patterns.

Standard POSIX interface: open, read, write, stat, readdir, rename
Transparent EC encoding on write, decoding on read
Configurable chunk size (default 256 MB) with LRU read caching and sequential prefetch
Metadata caching for reduced master node round-trips
Compatible with standard Linux tools: cp, rsync, tar, etc.

bash

# Mount KDSS as a local filesystem
ksfs -c /etc/ksfs/mount.toml

# Configuration file (mount.toml)
[cluster]
masters = ["master-1:6700", "master-2:6700", "master-3:6700"]

[auth]
access_key = "your-access-key"
secret_key = "your-secret-key"
bucket = "my-bucket"

[mount]
mountpoint = "/mnt/kdss"

# Now use it like any local directory
ls /mnt/kdss/
cp /var/log/app.log /mnt/kdss/logs/
df -h /mnt/kdss

Reliability

Auto Repair & Recovery

KDSS continuously monitors disk health via S.M.A.R.T. metrics and heartbeat signals. When a fault is detected -- whether a slow disk, read error, or complete disk failure -- the system automatically isolates the affected disk, reconstructs missing shards using EC parity data, and redistributes them to healthy nodes, all without manual intervention.

S.M.A.R.T. monitoring for early disk failure prediction
Three-layer disk health monitoring: S.M.A.R.T., dmesg error scanning, and I/O error sliding window
EC parity-based shard reconstruction without full data copies
Distributed repair execution across all Master nodes for parallel recovery
Configurable repair concurrency and bandwidth throttling
Repair progress tracking via Web Console and Prometheus metrics

Auto Repair Flow

  Disk Health Monitor
        |
        v
  +------------------+
  | S.M.A.R.T. Check |---> Normal ---> Continue
  | Heartbeat Check  |                 Monitoring
  +------------------+
        |
      Fault Detected
        |
        v
  +------------------+
  | Isolate Disk     |  Mark disk as "offline"
  | Stop I/O         |  Redirect traffic
  +------------------+
        |
        v
  +------------------+
  | Identify Missing |  Scan stripe metadata
  | Shards           |  for affected data
  +------------------+
        |
        v
  +------------------+
  | EC Reconstruct   |  Rebuild from parity
  | Missing Shards   |  shards (k of n)
  +------------------+
        |
        v
  +------------------+
  | Place on Healthy |  Rebalance across
  | Nodes            |  available disks
  +------------------+
        |
        v
  Repair Complete ---> Resume Normal Operation

Operations

Data Balancing

As your cluster grows and workloads shift, disk utilization can become uneven. KDSS includes a utilization-aware data balancing engine that non-disruptively migrates shards from over-utilized to under-utilized disks, keeping your cluster healthy and storage evenly distributed with real-time progress monitoring.

Utilization-aware scheduling: triggers when imbalance exceeds configurable threshold
Non-disruptive migration: live data movement without service interruption
Bandwidth throttling to limit impact on production workloads
Progress monitoring via CLI and Web Console dashboard
Automatic rebalancing when new disks or nodes are added to the cluster

bash

# Check cluster disk utilization
kdss-cli balance status

Disk Utilization:
  node-01/sda   ████████████░░░  82%
  node-01/sdb   █████████░░░░░░  61%
  node-02/sda   ██████████████░  93%
  node-02/sdb   ███████░░░░░░░░  47%
  node-03/sda   ██████████░░░░░  68%

Imbalance: 46%  (threshold: 20%)

# Start balancing
kdss-cli balance start --max-bandwidth 100MB/s

Balancing in progress...
  Migrated: 128 shards (12.4 GB)
  Remaining: 64 shards (~6.2 GB)
  ETA: 3m 22s

Management

Web Management Console

A full-featured web-based management console gives you complete visibility and control over your KDSS cluster. Monitor cluster health at a glance, manage nodes and disks, configure EC policies, and control user access -- all through an intuitive dashboard with role-based access control (RBAC).

Real-time dashboard: cluster health, capacity, throughput, and IOPS
Node and disk management: add, remove, decommission with guided workflows
Bucket and object browser with search, upload, and download
RBAC with predefined roles: Admin, Storage Admin, Read-Only
Alert configuration and notification history

Dashboard Overview

+-----------------------------------------------+
|  KDSS Console            [admin] [Logout]      |
+-----------------------------------------------+
|         |                                      |
| Dash    |  Cluster Health: Healthy             |
| Nodes   |                                      |
| Disks   |  Nodes: 5/5 online                   |
| Buckets |  Disks: 20/20 active                 |
| Users   |  Capacity: 42.8 TB / 60 TB (71%)     |
| Alerts  |                                      |
| Config  |  Throughput   IOPS                    |
|         |  ~~~^~~~      ~~^~~~                  |
|         |  348 MB/s     12.4K                   |
|         |                                      |
|         |  Recent Alerts (3)                    |
|         |  ! Disk node-02/sdc S.M.A.R.T. warn  |
|         |  ! Repair job #47 completed           |
|         |  i Balance job started                |
+-----------------------------------------------+

Observability

Monitoring & Alerting

KDSS exposes comprehensive Prometheus metrics out of the box, covering cluster health, disk I/O, EC operations, and API latencies. With 32 built-in alert rules, pre-configured Grafana dashboards, and Lark (Feishu) webhook integration, you get full observability without building monitoring infrastructure from scratch.

Prometheus-native /metrics endpoint on every component
32 built-in alert rules covering disk, node, capacity, and performance
Pre-built Grafana dashboards for cluster, node, and disk-level views
Lark (Feishu) webhook for real-time alert notifications
Configurable alert thresholds and notification routing

prometheus.yml

# Prometheus scrape config for KDSS
scrape_configs:
  - job_name: 'kdss-master'
    static_configs:
      - targets: ['master-01:6701']
    metrics_path: /metrics

  - job_name: 'kdss-storage'
    static_configs:
      - targets:
          - 'storage-01:6801'
          - 'storage-02:6801'
          - 'storage-03:6801'

# Sample alert rules (32 built-in)
# - DiskSpaceCritical (>90%)
# - DiskSmartWarning
# - NodeHeartbeatLost (>30s)
# - ECRepairQueueHigh (>100)
# - APILatencyP99High (>500ms)
# - ReplicationLagHigh
# ...

Lifecycle

GC & Reclamation

KDSS implements a safe two-phase delete process with a configurable recycle bin retention period. Automatic garbage collection runs in the background to reclaim space from deleted objects and stale temporary data, while ensuring no data is permanently removed before the retention window expires.

Two-phase delete: soft delete to recycle bin, then gc_pending with configurable auto-purge
Automatic housekeeping with configurable scan interval and auto-gc timeout
Stale multipart upload cleanup after configurable timeout
Background compaction with soft and force thresholds
GC progress and space reclaimed metrics exported to Prometheus

master.toml

# Garbage Collection Configuration
[gc]
interval_sec = 3600              # Housekeeping scan interval (seconds)
auto_gc_pending_hours = 48       # Auto-delete gc_pending after N hours (0=disable)
stale_writing_minutes = 120      # Clean up stale writing stripes after N minutes

# Capacity Alerts
[capacity]
alert_pct  = 95                  # Alert when cluster usage >= 95%
reject_pct = 99                  # Reject writes when >= 99%

# Background Compaction (storage.toml)
[compactor]
enabled            = true
threshold          = 0.2         # Soft threshold: compact when idle (20%)
force_threshold    = 0.5         # Force compact regardless of load (50%)