Architecture

Component Details

Master Server :6700

Central metadata manager responsible for namespace management, EC group allocation, volume placement, and cluster health monitoring. Coordinates all storage nodes via gRPC heartbeats.

Storage Server :6800

Handles raw disk I/O with append-only writes. Manages shard storage on local disks with CRC32 integrity verification, three-layer health monitoring, and background compaction. Reports disk health to the Master via heartbeat.

S3 Gateway :9000

Provides S3-compatible API access including multipart upload, presigned URLs, bucket policies, and streaming for large objects. Translates S3 requests into internal storage operations.

FUSE Client mount

Mounts KDSS as a local POSIX filesystem. Performs EC encoding on write and decoding on read, with LRU chunk caching and sequential prefetch for optimal throughput.

Web Console :8081

Full-featured dashboard for cluster management, real-time monitoring, alerting with 32 built-in rules, RBAC access control, and visual topology overview.

MongoDB :27017

Stores all metadata including file entries, EC group mappings, volume information, user accounts, and access policies. Supports replica set deployment for high availability.

Data Flow

Write Path

Client sends data via S3 API or FUSE mount
Client splits data into fixed-size chunks (default 256 MB)
Client EC-encodes each chunk into data + parity shards
Shards are distributed to storage nodes in parallel via gRPC
Storage nodes append shards to raw disk with CRC32 checksum
Master records shard locations in metadata store

Read Path

Client requests data via S3 API or FUSE mount
Master returns shard locations from metadata store
Client fetches data shards in parallel from storage nodes
If any shard is unavailable, parity shards are used for EC recovery
Client EC-decodes and reassembles the original data
Data is returned to the application

Storage Layout

KDSS bypasses the filesystem and writes directly to raw disks. Each disk begins with dual SuperBlocks for crash recovery, followed by an append-only sequence of data records.

┌──────────────┬──────────────┬──────────────────────────┐
│ SuperBlock    │ SuperBlock   │  Data Records (append)   │
│ Primary (2KB) │ Backup (2KB) │  Record = Header + Shard │
└──────────────┴──────────────┴──────────────────────────┘
Shard index maintained by embedded BadgerDB (on SSD)

Deployment Topology

Mixed Deployment (Recommended)

Master nodes run both Master and Storage services, maximizing disk utilization. Additional pure Storage nodes handle data I/O. Recommended for all cluster sizes (9-72+ nodes).

┌─────────────────────────────────────┐
│   Mixed Node 1                      │
│   Master + Storage + MongoDB        │
└──────────────────┬──────────────────┘
                   │
┌──────────────────┴──────────────────┐
│   Mixed Node 2                      │
│   Master + Storage + MongoDB        │
└──────────────────┬──────────────────┘
                   │
┌──────────────────┴──────────────────┐
│   Pure Storage Node (×N)            │
│   Storage only                      │
└─────────────────────────────────────┘

Scale-Out Deployment

For 70+ node clusters, additional Master nodes (up to 7) handle metadata, while the majority of nodes run Storage only.

┌──────────┐  ┌──────────┐  ┌──────────┐
│ Master 1 │  │ Master 2 │  │  ... ×7  │
│+Storage  │  │+Storage  │  │+Storage  │
│+MongoDB  │  │+MongoDB  │  │+MongoDB  │
└────┬─────┘  └────┬─────┘  └────┬─────┘
     │             │             │
─────┴─────────────┴─────────────┴─────
     │             │             │
┌────┴─────┐  ┌────┴─────┐  ┌────┴─────┐
│Storage 1 │  │Storage 2 │  │  ... ×65 │
└──────────┘  └──────────┘  └──────────┘

System Architecture

Architecture Overview