Documentation

Requirements

Before deploying KDSS, ensure your hardware and software meet the following specifications. Requirements vary depending on workload and data volume.

Minimum Hardware

Resource	Minimum	Notes
CPU	8 cores	Per node minimum for Storage; 16 cores recommended
RAM	32 GB	64-128 GB for Mixed (Master+Storage) nodes
System Disk	200 GB SSD	1 TB NVMe for Mixed nodes (MongoDB + BadgerDB)
HDD	1+ data disks	Up to 36 × 16 TB HDD per node
Network	10 Gbps	20 Gbps bonded recommended

Recommended Hardware

Resource	Recommended	Notes
CPU	16 cores	More cores accelerate EC encoding
RAM	128 GB	Mixed nodes: 64 GB for MongoDB + 32 GB for services
System Disk	2 TB NVMe	NVMe for OS, MongoDB, and BadgerDB indexes
HDD	36 x 16 TB	High-capacity mechanical drives for storage nodes
Network	20 Gbps bonded	Bonded NICs for high availability

Software Requirements

Ubuntu 24.04 Server (recommended OS)
MongoDB 8.0+ (replica set for high availability metadata)
Go 1.24+ (only required for building from source)
FUSE 3.x (required for FUSE mount client)
smartmontools (required for disk health monitoring)

Capacity Planning

Understanding how erasure coding affects usable capacity is essential for planning your cluster size. KDSS uses Reed-Solomon EC, which splits data into data shards and parity shards.

Usable Capacity Formula

The usable storage capacity is determined by the EC configuration:

formula

Usable Capacity = Raw Capacity x (data_shards / (data_shards + parity_shards))

Example Calculations

The following table shows usable capacity for a cluster with 100 TB of raw disk space under different EC configurations:

EC Config	Data Shards	Parity Shards	Efficiency	Usable (100 TB raw)	Fault Tolerance
`EC 5+2`	5	2	71.4%	71.4 TB	2 disk failures
`EC 9+2`	9	2	81.8%	81.8 TB	2 disk failures
`EC 18+3`	18	3	85.7%	85.7 TB	3 disk failures
`EC 29+3`	29	3	90.6%	90.6 TB	3 disk failures
`EC 31+2`	31	2	93.9%	93.9 TB	2 disk failures

Tip: For most production workloads, EC 9+2 offers a good balance between space efficiency (81.8%) and fault tolerance (2 simultaneous failures). Use EC 18+3 or EC 29+3 for higher parity protection if your environment has a higher risk of correlated disk failures.

Warning: EC 31+2 Ultra-Density Mode is designed exclusively for brand-new enterprise-grade servers and drives, deployed in Tier 3+ data centers with 24/7 on-site staff for immediate hardware fault response. With only 2 parity shards, the fault tolerance margin is minimal — any delay in disk replacement increases the risk of data loss.

S3 API

KDSS provides a fully S3-compatible API gateway, supporting standard S3 tools and SDKs. The default endpoint is http://<master-host>:9000.

AWS CLI

Use the standard AWS CLI to interact with KDSS. Configure a custom endpoint to point to your KDSS S3 gateway.

bash

# Configure credentials (use KDSS access key / secret key)
aws configure
# AWS Access Key ID: your-access-key
# AWS Secret Access Key: your-secret-key
# Default region name: us-east-1
# Default output format: json

# Set alias for convenience
alias s3='aws --endpoint-url http://master-host:9000 s3'
alias s3api='aws --endpoint-url http://master-host:9000 s3api'

# Bucket operations
s3 mb s3://my-bucket                          # Create bucket
s3 ls                                         # List buckets
s3 ls s3://my-bucket/                         # List objects
s3 rb s3://my-bucket --force                  # Delete bucket

# Upload / Download
s3 cp local-file.dat s3://my-bucket/          # Upload file
s3 cp s3://my-bucket/file.dat ./              # Download file
s3 cp ./data/ s3://my-bucket/data/ --recursive  # Upload directory
s3 sync ./backup/ s3://my-bucket/backup/      # Sync directory

# Delete
s3 rm s3://my-bucket/file.dat                 # Delete object
s3 rm s3://my-bucket/ --recursive             # Delete all objects

s3cmd

s3cmd is a lightweight command-line tool for S3-compatible storage.

~/.s3cfg

[default]
access_key = your-access-key
secret_key = your-secret-key
host_base = master-host:9000
host_bucket = master-host:9000/%(bucket)
use_https = False
signature_v2 = False

bash

# Bucket operations
s3cmd mb s3://my-bucket
s3cmd ls

# Upload / Download
s3cmd put file.dat s3://my-bucket/
s3cmd get s3://my-bucket/file.dat ./
s3cmd sync ./data/ s3://my-bucket/data/

# File info
s3cmd info s3://my-bucket/file.dat

Python (boto3)

Use the boto3 SDK to access KDSS programmatically from Python.

python

import boto3

# Create S3 client pointing to KDSS
s3 = boto3.client(
    "s3",
    endpoint_url="http://master-host:9000",
    aws_access_key_id="your-access-key",
    aws_secret_access_key="your-secret-key",
    region_name="us-east-1",
)

# Create bucket
s3.create_bucket(Bucket="my-bucket")

# Upload file
s3.upload_file("local-file.dat", "my-bucket", "path/to/file.dat")

# Download file
s3.download_file("my-bucket", "path/to/file.dat", "downloaded.dat")

# List objects
resp = s3.list_objects_v2(Bucket="my-bucket", Prefix="path/")
for obj in resp.get("Contents", []):
    print(obj["Key"], obj["Size"])

# Generate presigned URL (valid for 1 hour)
url = s3.generate_presigned_url(
    "get_object",
    Params={"Bucket": "my-bucket", "Key": "path/to/file.dat"},
    ExpiresIn=3600,
)
print(url)

# Delete object
s3.delete_object(Bucket="my-bucket", Key="path/to/file.dat")

Go

Use the AWS SDK for Go (v2) to interact with KDSS.

package main

import (
    "context"
    "fmt"
    "os"

    "github.com/aws/aws-sdk-go-v2/aws"
    "github.com/aws/aws-sdk-go-v2/config"
    "github.com/aws/aws-sdk-go-v2/credentials"
    "github.com/aws/aws-sdk-go-v2/service/s3"
)

func main() {
    // Create S3 client pointing to KDSS
    cfg, _ := config.LoadDefaultConfig(context.TODO(),
        config.WithRegion("us-east-1"),
        config.WithCredentialsProvider(
            credentials.NewStaticCredentialsProvider("your-access-key", "your-secret-key", ""),
        ),
    )
    client := s3.NewFromConfig(cfg, func(o *s3.Options) {
        o.BaseEndpoint = aws.String("http://master-host:9000")
        o.UsePathStyle = true
    })

    // Upload file
    file, _ := os.Open("local-file.dat")
    defer file.Close()
    client.PutObject(context.TODO(), &s3.PutObjectInput{
        Bucket: aws.String("my-bucket"),
        Key:    aws.String("path/to/file.dat"),
        Body:   file,
    })

    // List objects
    resp, _ := client.ListObjectsV2(context.TODO(), &s3.ListObjectsV2Input{
        Bucket: aws.String("my-bucket"),
    })
    for _, obj := range resp.Contents {
        fmt.Printf("%s  %d bytes\n", *obj.Key, *obj.Size)
    }
}

Supported S3 Operations

KDSS S3 gateway supports the following S3 API operations:

Category	Operations
Bucket	`CreateBucket` `DeleteBucket` `ListBuckets` `HeadBucket`
Object	`PutObject` `GetObject` `DeleteObject` `HeadObject` `CopyObject`
List	`ListObjectsV2` `ListObjectVersions`
Multipart Upload	`CreateMultipartUpload` `UploadPart` `CompleteMultipartUpload` `AbortMultipartUpload`
Presigned URL	`GET` `PUT`

Configuration

KDSS uses TOML configuration files for all components. This section provides a full reference for every configuration option.

master.toml Reference

Configuration for the KDSS master node.

Key	Default	Description
`listen`	`:6700`	gRPC listen address
`web_listen`	`:8081`	Web console listen address
`s3_listen`	`:9000`	S3 gateway listen address
`mongo_uri`	(required)	MongoDB connection URI
`mongo_db`	`kdss`	MongoDB database name
`ec.data_shards`	(required)	Number of data shards per stripe
`ec.parity_shards`	(required)	Number of parity shards per stripe
`leader.lock_ttl_sec`	`10`	Leader lock TTL (seconds)
`leader.renew_interval_sec`	`3`	Leader lock renewal interval
`gc.interval_sec`	`3600`	GC housekeeping scan interval (seconds)
`gc.auto_gc_pending_hours`	`48`	Auto-delete gc_pending stripes after N hours
`capacity.alert_pct`	`95`	Cluster usage alert threshold (%)
`capacity.reject_pct`	`99`	Cluster usage reject threshold (%)
`repair.concurrency`	`4`	Repair tasks per Master node
`log.level`	`info`	Log level (debug, info, warn, error)
`log.file`	`/var/log/kdss/master.log`	Log file path

storage.toml Reference

Configuration for KDSS storage nodes.

Key	Default	Description
`node_id`	(required)	Unique node identifier
`listen`	`:6800`	gRPC listen address
`master_addrs`	(required)	Master node addresses
`index_dir`	`/opt/kdss/index`	BadgerDB index directory (SSD recommended)
`[[disks]].disk_id`	(required)	Disk identifier
`[[disks]].device`	(required)	Disk device path (use /dev/disk/by-id/)
`heartbeat.interval_sec`	`60`	Heartbeat interval (seconds)
`heartbeat.timeout_sec`	`180`	Heartbeat timeout (seconds)
`checker.enabled`	`true`	Enable CRC32 integrity scanning
`checker.rate_mb_per_sec`	`50`	Scan rate limit (MB/s per disk)
`compactor.enabled`	`true`	Enable background compaction
`compactor.threshold`	`0.2`	Soft compaction threshold (20%)
`compactor.force_threshold`	`0.5`	Force compaction threshold (50%)
`sync.mode`	`immediate`	Fsync mode: immediate, batch, or deferred
`log.level`	`info`	Log level
`log.file`	`/var/log/kdss/storage.log`	Log file path

mount.toml Reference

Configuration for the KDSS FUSE mount client.

Key	Default	Description
`cluster.masters`	(required)	Master node addresses
`auth.access_key`	(required)	S3 access key
`auth.secret_key`	(required)	S3 secret key
`auth.bucket`	(required)	Target bucket name
`mount.mountpoint`	(required)	Local mount point path
`mount.entry_timeout_s`	`3600`	Kernel dentry cache TTL (seconds)
`mount.attr_timeout_s`	`3600`	Kernel attr cache TTL (seconds)
`performance.chunk_size_mb`	`256`	Chunk size in MB (1 chunk = 1 stripe)
`performance.read_rate_limit_mb`	`0`	Read rate limit (MB/s, 0=unlimited)
`performance.write_rate_limit_mb`	`0`	Write rate limit (MB/s, 0=unlimited)
`timeout.master_timeout_ms`	`5000`	Master RPC timeout (ms)
`timeout.storage_timeout_ms`	`10000`	Storage RPC timeout (ms)
`log.level`	`info`	Log level
`log.file`	(stdout)	Log file path