Skip to content

Sharding

TensorDB distributes data across multiple shards for write parallelism.

Shard Routing

Keys are routed deterministically:

shard_id = hash(key) % shard_count

Each shard is an independent actor with its own:

  • WAL file
  • Memtable
  • SSTable files and manifest
  • Commit counter (monotonically increasing)

Single-Writer Actors

Each shard uses a single-writer actor model:

  • One writer thread per shard processes writes sequentially
  • No fine-grained locks needed — the actor serializes all writes
  • Multiple readers via ShardReadHandle (lock-free)

This design eliminates write contention while enabling concurrent reads.

Parallelism

With shard_count = 4 (default):

  • 4 independent write pipelines
  • Writes to different shards execute concurrently
  • Cross-shard queries (e.g., full table scans) merge results from all shards

Configuration

ParameterDefaultDescription
shard_count4Number of independent shards

Choosing Shard Count

WorkloadRecommendation
Single-threaded1–2 shards
General purpose4 shards (default)
Write-heavy8–16 shards
Many coresMatch to core count