Home Features Solutions Architecture Docs About

Requirements

Before deploying KDSS, ensure your hardware and software meet the following specifications. Requirements vary depending on workload and data volume.

Minimum Hardware

Resource Minimum Notes
CPU 8 cores Per node minimum for Storage; 16 cores recommended
RAM 32 GB 64-128 GB for Mixed (Master+Storage) nodes
System Disk 200 GB SSD 1 TB NVMe for Mixed nodes (MongoDB + BadgerDB)
HDD 1+ data disks Up to 36 × 16 TB HDD per node
Network 10 Gbps 20 Gbps bonded recommended

Recommended Hardware

Resource Recommended Notes
CPU 16 cores More cores accelerate EC encoding
RAM 128 GB Mixed nodes: 64 GB for MongoDB + 32 GB for services
System Disk 2 TB NVMe NVMe for OS, MongoDB, and BadgerDB indexes
HDD 36 x 16 TB High-capacity mechanical drives for storage nodes
Network 20 Gbps bonded Bonded NICs for high availability

Software Requirements

  • Ubuntu 24.04 Server (recommended OS)
  • MongoDB 8.0+ (replica set for high availability metadata)
  • Go 1.24+ (only required for building from source)
  • FUSE 3.x (required for FUSE mount client)
  • smartmontools (required for disk health monitoring)

Capacity Planning

Understanding how erasure coding affects usable capacity is essential for planning your cluster size. KDSS uses Reed-Solomon EC, which splits data into data shards and parity shards.

Usable Capacity Formula

The usable storage capacity is determined by the EC configuration:

formula
Usable Capacity = Raw Capacity x (data_shards / (data_shards + parity_shards))

Example Calculations

The following table shows usable capacity for a cluster with 100 TB of raw disk space under different EC configurations:

EC Config Data Shards Parity Shards Efficiency Usable (100 TB raw) Fault Tolerance
EC 5+2 5 2 71.4% 71.4 TB 2 disk failures
EC 9+2 9 2 81.8% 81.8 TB 2 disk failures
EC 18+3 18 3 85.7% 85.7 TB 3 disk failures
EC 29+3 29 3 90.6% 90.6 TB 3 disk failures
EC 31+2 31 2 93.9% 93.9 TB 2 disk failures

Tip: For most production workloads, EC 9+2 offers a good balance between space efficiency (81.8%) and fault tolerance (2 simultaneous failures). Use EC 18+3 or EC 29+3 for higher parity protection if your environment has a higher risk of correlated disk failures.

Warning: EC 31+2 Ultra-Density Mode is designed exclusively for brand-new enterprise-grade servers and drives, deployed in Tier 3+ data centers with 24/7 on-site staff for immediate hardware fault response. With only 2 parity shards, the fault tolerance margin is minimal — any delay in disk replacement increases the risk of data loss.

S3 API

KDSS provides a fully S3-compatible API gateway, supporting standard S3 tools and SDKs. The default endpoint is http://<master-host>:9000.

AWS CLI

Use the standard AWS CLI to interact with KDSS. Configure a custom endpoint to point to your KDSS S3 gateway.

bash
# Configure credentials (use KDSS access key / secret key)
aws configure
# AWS Access Key ID: your-access-key
# AWS Secret Access Key: your-secret-key
# Default region name: us-east-1
# Default output format: json

# Set alias for convenience
alias s3='aws --endpoint-url http://master-host:9000 s3'
alias s3api='aws --endpoint-url http://master-host:9000 s3api'

# Bucket operations
s3 mb s3://my-bucket                          # Create bucket
s3 ls                                         # List buckets
s3 ls s3://my-bucket/                         # List objects
s3 rb s3://my-bucket --force                  # Delete bucket

# Upload / Download
s3 cp local-file.dat s3://my-bucket/          # Upload file
s3 cp s3://my-bucket/file.dat ./              # Download file
s3 cp ./data/ s3://my-bucket/data/ --recursive  # Upload directory
s3 sync ./backup/ s3://my-bucket/backup/      # Sync directory

# Delete
s3 rm s3://my-bucket/file.dat                 # Delete object
s3 rm s3://my-bucket/ --recursive             # Delete all objects

s3cmd

s3cmd is a lightweight command-line tool for S3-compatible storage.

~/.s3cfg
[default]
access_key = your-access-key
secret_key = your-secret-key
host_base = master-host:9000
host_bucket = master-host:9000/%(bucket)
use_https = False
signature_v2 = False
bash
# Bucket operations
s3cmd mb s3://my-bucket
s3cmd ls

# Upload / Download
s3cmd put file.dat s3://my-bucket/
s3cmd get s3://my-bucket/file.dat ./
s3cmd sync ./data/ s3://my-bucket/data/

# File info
s3cmd info s3://my-bucket/file.dat

Python (boto3)

Use the boto3 SDK to access KDSS programmatically from Python.

python
import boto3

# Create S3 client pointing to KDSS
s3 = boto3.client(
    "s3",
    endpoint_url="http://master-host:9000",
    aws_access_key_id="your-access-key",
    aws_secret_access_key="your-secret-key",
    region_name="us-east-1",
)

# Create bucket
s3.create_bucket(Bucket="my-bucket")

# Upload file
s3.upload_file("local-file.dat", "my-bucket", "path/to/file.dat")

# Download file
s3.download_file("my-bucket", "path/to/file.dat", "downloaded.dat")

# List objects
resp = s3.list_objects_v2(Bucket="my-bucket", Prefix="path/")
for obj in resp.get("Contents", []):
    print(obj["Key"], obj["Size"])

# Generate presigned URL (valid for 1 hour)
url = s3.generate_presigned_url(
    "get_object",
    Params={"Bucket": "my-bucket", "Key": "path/to/file.dat"},
    ExpiresIn=3600,
)
print(url)

# Delete object
s3.delete_object(Bucket="my-bucket", Key="path/to/file.dat")

Go

Use the AWS SDK for Go (v2) to interact with KDSS.

go
package main

import (
    "context"
    "fmt"
    "os"

    "github.com/aws/aws-sdk-go-v2/aws"
    "github.com/aws/aws-sdk-go-v2/config"
    "github.com/aws/aws-sdk-go-v2/credentials"
    "github.com/aws/aws-sdk-go-v2/service/s3"
)

func main() {
    // Create S3 client pointing to KDSS
    cfg, _ := config.LoadDefaultConfig(context.TODO(),
        config.WithRegion("us-east-1"),
        config.WithCredentialsProvider(
            credentials.NewStaticCredentialsProvider("your-access-key", "your-secret-key", ""),
        ),
    )
    client := s3.NewFromConfig(cfg, func(o *s3.Options) {
        o.BaseEndpoint = aws.String("http://master-host:9000")
        o.UsePathStyle = true
    })

    // Upload file
    file, _ := os.Open("local-file.dat")
    defer file.Close()
    client.PutObject(context.TODO(), &s3.PutObjectInput{
        Bucket: aws.String("my-bucket"),
        Key:    aws.String("path/to/file.dat"),
        Body:   file,
    })

    // List objects
    resp, _ := client.ListObjectsV2(context.TODO(), &s3.ListObjectsV2Input{
        Bucket: aws.String("my-bucket"),
    })
    for _, obj := range resp.Contents {
        fmt.Printf("%s  %d bytes\n", *obj.Key, *obj.Size)
    }
}

Supported S3 Operations

KDSS S3 gateway supports the following S3 API operations:

Category Operations
Bucket CreateBucket DeleteBucket ListBuckets HeadBucket
Object PutObject GetObject DeleteObject HeadObject CopyObject
List ListObjectsV2 ListObjectVersions
Multipart Upload CreateMultipartUpload UploadPart CompleteMultipartUpload AbortMultipartUpload
Presigned URL GET PUT

Configuration

KDSS uses TOML configuration files for all components. This section provides a full reference for every configuration option.

master.toml Reference

Configuration for the KDSS master node.

Key Default Description
listen :6700 gRPC listen address
web_listen :8081 Web console listen address
s3_listen :9000 S3 gateway listen address
mongo_uri (required) MongoDB connection URI
mongo_db kdss MongoDB database name
ec.data_shards (required) Number of data shards per stripe
ec.parity_shards (required) Number of parity shards per stripe
leader.lock_ttl_sec 10 Leader lock TTL (seconds)
leader.renew_interval_sec 3 Leader lock renewal interval
gc.interval_sec 3600 GC housekeeping scan interval (seconds)
gc.auto_gc_pending_hours 48 Auto-delete gc_pending stripes after N hours
capacity.alert_pct 95 Cluster usage alert threshold (%)
capacity.reject_pct 99 Cluster usage reject threshold (%)
repair.concurrency 4 Repair tasks per Master node
log.level info Log level (debug, info, warn, error)
log.file /var/log/kdss/master.log Log file path

storage.toml Reference

Configuration for KDSS storage nodes.

Key Default Description
node_id (required) Unique node identifier
listen :6800 gRPC listen address
master_addrs (required) Master node addresses
index_dir /opt/kdss/index BadgerDB index directory (SSD recommended)
[[disks]].disk_id (required) Disk identifier
[[disks]].device (required) Disk device path (use /dev/disk/by-id/)
heartbeat.interval_sec 60 Heartbeat interval (seconds)
heartbeat.timeout_sec 180 Heartbeat timeout (seconds)
checker.enabled true Enable CRC32 integrity scanning
checker.rate_mb_per_sec 50 Scan rate limit (MB/s per disk)
compactor.enabled true Enable background compaction
compactor.threshold 0.2 Soft compaction threshold (20%)
compactor.force_threshold 0.5 Force compaction threshold (50%)
sync.mode immediate Fsync mode: immediate, batch, or deferred
log.level info Log level
log.file /var/log/kdss/storage.log Log file path

mount.toml Reference

Configuration for the KDSS FUSE mount client.

Key Default Description
cluster.masters (required) Master node addresses
auth.access_key (required) S3 access key
auth.secret_key (required) S3 secret key
auth.bucket (required) Target bucket name
mount.mountpoint (required) Local mount point path
mount.entry_timeout_s 3600 Kernel dentry cache TTL (seconds)
mount.attr_timeout_s 3600 Kernel attr cache TTL (seconds)
performance.chunk_size_mb 256 Chunk size in MB (1 chunk = 1 stripe)
performance.read_rate_limit_mb 0 Read rate limit (MB/s, 0=unlimited)
performance.write_rate_limit_mb 0 Write rate limit (MB/s, 0=unlimited)
timeout.master_timeout_ms 5000 Master RPC timeout (ms)
timeout.storage_timeout_ms 10000 Storage RPC timeout (ms)
log.level info Log level
log.file (stdout) Log file path