Requirements
Before deploying KDSS, ensure your hardware and software meet the following specifications. Requirements vary depending on workload and data volume.
Minimum Hardware
| Resource | Minimum | Notes |
|---|---|---|
| CPU | 8 cores | Per node minimum for Storage; 16 cores recommended |
| RAM | 32 GB | 64-128 GB for Mixed (Master+Storage) nodes |
| System Disk | 200 GB SSD | 1 TB NVMe for Mixed nodes (MongoDB + BadgerDB) |
| HDD | 1+ data disks | Up to 36 × 16 TB HDD per node |
| Network | 10 Gbps | 20 Gbps bonded recommended |
Recommended Hardware
| Resource | Recommended | Notes |
|---|---|---|
| CPU | 16 cores | More cores accelerate EC encoding |
| RAM | 128 GB | Mixed nodes: 64 GB for MongoDB + 32 GB for services |
| System Disk | 2 TB NVMe | NVMe for OS, MongoDB, and BadgerDB indexes |
| HDD | 36 x 16 TB | High-capacity mechanical drives for storage nodes |
| Network | 20 Gbps bonded | Bonded NICs for high availability |
Software Requirements
- Ubuntu 24.04 Server (recommended OS)
- MongoDB 8.0+ (replica set for high availability metadata)
- Go 1.24+ (only required for building from source)
- FUSE 3.x (required for FUSE mount client)
- smartmontools (required for disk health monitoring)
Capacity Planning
Understanding how erasure coding affects usable capacity is essential for planning your cluster size. KDSS uses Reed-Solomon EC, which splits data into data shards and parity shards.
Usable Capacity Formula
The usable storage capacity is determined by the EC configuration:
Usable Capacity = Raw Capacity x (data_shards / (data_shards + parity_shards))
Example Calculations
The following table shows usable capacity for a cluster with 100 TB of raw disk space under different EC configurations:
| EC Config | Data Shards | Parity Shards | Efficiency | Usable (100 TB raw) | Fault Tolerance |
|---|---|---|---|---|---|
EC 5+2 |
5 | 2 | 71.4% | 71.4 TB | 2 disk failures |
EC 9+2 |
9 | 2 | 81.8% | 81.8 TB | 2 disk failures |
EC 18+3 |
18 | 3 | 85.7% | 85.7 TB | 3 disk failures |
EC 29+3 |
29 | 3 | 90.6% | 90.6 TB | 3 disk failures |
EC 31+2 |
31 | 2 | 93.9% | 93.9 TB | 2 disk failures |
Tip: For most production workloads, EC 9+2 offers a good balance between space efficiency (81.8%) and fault tolerance (2 simultaneous failures). Use EC 18+3 or EC 29+3 for higher parity protection if your environment has a higher risk of correlated disk failures.
Warning: EC 31+2 Ultra-Density Mode is designed exclusively for brand-new enterprise-grade servers and drives, deployed in Tier 3+ data centers with 24/7 on-site staff for immediate hardware fault response. With only 2 parity shards, the fault tolerance margin is minimal — any delay in disk replacement increases the risk of data loss.
S3 API
KDSS provides a fully S3-compatible API gateway, supporting standard S3 tools and SDKs. The default endpoint is http://<master-host>:9000.
AWS CLI
Use the standard AWS CLI to interact with KDSS. Configure a custom endpoint to point to your KDSS S3 gateway.
# Configure credentials (use KDSS access key / secret key)
aws configure
# AWS Access Key ID: your-access-key
# AWS Secret Access Key: your-secret-key
# Default region name: us-east-1
# Default output format: json
# Set alias for convenience
alias s3='aws --endpoint-url http://master-host:9000 s3'
alias s3api='aws --endpoint-url http://master-host:9000 s3api'
# Bucket operations
s3 mb s3://my-bucket # Create bucket
s3 ls # List buckets
s3 ls s3://my-bucket/ # List objects
s3 rb s3://my-bucket --force # Delete bucket
# Upload / Download
s3 cp local-file.dat s3://my-bucket/ # Upload file
s3 cp s3://my-bucket/file.dat ./ # Download file
s3 cp ./data/ s3://my-bucket/data/ --recursive # Upload directory
s3 sync ./backup/ s3://my-bucket/backup/ # Sync directory
# Delete
s3 rm s3://my-bucket/file.dat # Delete object
s3 rm s3://my-bucket/ --recursive # Delete all objects
s3cmd
s3cmd is a lightweight command-line tool for S3-compatible storage.
[default]
access_key = your-access-key
secret_key = your-secret-key
host_base = master-host:9000
host_bucket = master-host:9000/%(bucket)
use_https = False
signature_v2 = False
# Bucket operations
s3cmd mb s3://my-bucket
s3cmd ls
# Upload / Download
s3cmd put file.dat s3://my-bucket/
s3cmd get s3://my-bucket/file.dat ./
s3cmd sync ./data/ s3://my-bucket/data/
# File info
s3cmd info s3://my-bucket/file.dat
Python (boto3)
Use the boto3 SDK to access KDSS programmatically from Python.
import boto3
# Create S3 client pointing to KDSS
s3 = boto3.client(
"s3",
endpoint_url="http://master-host:9000",
aws_access_key_id="your-access-key",
aws_secret_access_key="your-secret-key",
region_name="us-east-1",
)
# Create bucket
s3.create_bucket(Bucket="my-bucket")
# Upload file
s3.upload_file("local-file.dat", "my-bucket", "path/to/file.dat")
# Download file
s3.download_file("my-bucket", "path/to/file.dat", "downloaded.dat")
# List objects
resp = s3.list_objects_v2(Bucket="my-bucket", Prefix="path/")
for obj in resp.get("Contents", []):
print(obj["Key"], obj["Size"])
# Generate presigned URL (valid for 1 hour)
url = s3.generate_presigned_url(
"get_object",
Params={"Bucket": "my-bucket", "Key": "path/to/file.dat"},
ExpiresIn=3600,
)
print(url)
# Delete object
s3.delete_object(Bucket="my-bucket", Key="path/to/file.dat")
Go
Use the AWS SDK for Go (v2) to interact with KDSS.
package main
import (
"context"
"fmt"
"os"
"github.com/aws/aws-sdk-go-v2/aws"
"github.com/aws/aws-sdk-go-v2/config"
"github.com/aws/aws-sdk-go-v2/credentials"
"github.com/aws/aws-sdk-go-v2/service/s3"
)
func main() {
// Create S3 client pointing to KDSS
cfg, _ := config.LoadDefaultConfig(context.TODO(),
config.WithRegion("us-east-1"),
config.WithCredentialsProvider(
credentials.NewStaticCredentialsProvider("your-access-key", "your-secret-key", ""),
),
)
client := s3.NewFromConfig(cfg, func(o *s3.Options) {
o.BaseEndpoint = aws.String("http://master-host:9000")
o.UsePathStyle = true
})
// Upload file
file, _ := os.Open("local-file.dat")
defer file.Close()
client.PutObject(context.TODO(), &s3.PutObjectInput{
Bucket: aws.String("my-bucket"),
Key: aws.String("path/to/file.dat"),
Body: file,
})
// List objects
resp, _ := client.ListObjectsV2(context.TODO(), &s3.ListObjectsV2Input{
Bucket: aws.String("my-bucket"),
})
for _, obj := range resp.Contents {
fmt.Printf("%s %d bytes\n", *obj.Key, *obj.Size)
}
}
Supported S3 Operations
KDSS S3 gateway supports the following S3 API operations:
| Category | Operations |
|---|---|
| Bucket | CreateBucket DeleteBucket ListBuckets HeadBucket |
| Object | PutObject GetObject DeleteObject HeadObject CopyObject |
| List | ListObjectsV2 ListObjectVersions |
| Multipart Upload | CreateMultipartUpload UploadPart CompleteMultipartUpload AbortMultipartUpload |
| Presigned URL | GET PUT |
Configuration
KDSS uses TOML configuration files for all components. This section provides a full reference for every configuration option.
master.toml Reference
Configuration for the KDSS master node.
| Key | Default | Description |
|---|---|---|
listen |
:6700 |
gRPC listen address |
web_listen |
:8081 |
Web console listen address |
s3_listen |
:9000 |
S3 gateway listen address |
mongo_uri |
(required) | MongoDB connection URI |
mongo_db |
kdss |
MongoDB database name |
ec.data_shards |
(required) | Number of data shards per stripe |
ec.parity_shards |
(required) | Number of parity shards per stripe |
leader.lock_ttl_sec |
10 |
Leader lock TTL (seconds) |
leader.renew_interval_sec |
3 |
Leader lock renewal interval |
gc.interval_sec |
3600 |
GC housekeeping scan interval (seconds) |
gc.auto_gc_pending_hours |
48 |
Auto-delete gc_pending stripes after N hours |
capacity.alert_pct |
95 |
Cluster usage alert threshold (%) |
capacity.reject_pct |
99 |
Cluster usage reject threshold (%) |
repair.concurrency |
4 |
Repair tasks per Master node |
log.level |
info |
Log level (debug, info, warn, error) |
log.file |
/var/log/kdss/master.log |
Log file path |
storage.toml Reference
Configuration for KDSS storage nodes.
| Key | Default | Description |
|---|---|---|
node_id |
(required) | Unique node identifier |
listen |
:6800 |
gRPC listen address |
master_addrs |
(required) | Master node addresses |
index_dir |
/opt/kdss/index |
BadgerDB index directory (SSD recommended) |
[[disks]].disk_id |
(required) | Disk identifier |
[[disks]].device |
(required) | Disk device path (use /dev/disk/by-id/) |
heartbeat.interval_sec |
60 |
Heartbeat interval (seconds) |
heartbeat.timeout_sec |
180 |
Heartbeat timeout (seconds) |
checker.enabled |
true |
Enable CRC32 integrity scanning |
checker.rate_mb_per_sec |
50 |
Scan rate limit (MB/s per disk) |
compactor.enabled |
true |
Enable background compaction |
compactor.threshold |
0.2 |
Soft compaction threshold (20%) |
compactor.force_threshold |
0.5 |
Force compaction threshold (50%) |
sync.mode |
immediate |
Fsync mode: immediate, batch, or deferred |
log.level |
info |
Log level |
log.file |
/var/log/kdss/storage.log |
Log file path |
mount.toml Reference
Configuration for the KDSS FUSE mount client.
| Key | Default | Description |
|---|---|---|
cluster.masters |
(required) | Master node addresses |
auth.access_key |
(required) | S3 access key |
auth.secret_key |
(required) | S3 secret key |
auth.bucket |
(required) | Target bucket name |
mount.mountpoint |
(required) | Local mount point path |
mount.entry_timeout_s |
3600 |
Kernel dentry cache TTL (seconds) |
mount.attr_timeout_s |
3600 |
Kernel attr cache TTL (seconds) |
performance.chunk_size_mb |
256 |
Chunk size in MB (1 chunk = 1 stripe) |
performance.read_rate_limit_mb |
0 |
Read rate limit (MB/s, 0=unlimited) |
performance.write_rate_limit_mb |
0 |
Write rate limit (MB/s, 0=unlimited) |
timeout.master_timeout_ms |
5000 |
Master RPC timeout (ms) |
timeout.storage_timeout_ms |
10000 |
Storage RPC timeout (ms) |
log.level |
info |
Log level |
log.file |
(stdout) | Log file path |