Skip to content

SSTables

SSTables (Sorted String Tables) are the immutable on-disk format for TensorDB’s data.

Structure

Each SSTable contains:

┌─────────────────────┐
│ Data Block 0 │ ← Key-value pairs, prefix-compressed
│ Data Block 1 │
│ ... │
│ Data Block N │
├─────────────────────┤
│ Block Index │ ← Maps key ranges to block offsets
├─────────────────────┤
│ Bloom Filter │ ← Probabilistic key membership test
├─────────────────────┤
│ Footer │ ← Offsets to index and bloom filter
└─────────────────────┘

Data Blocks

  • Fixed-size blocks (default: 16KB, configurable via sstable_block_bytes)
  • Prefix compression: Keys sharing a common prefix store only the differing suffix
  • Sorted by key within each block
  • Independently addressable for random access

Block Index

The block index maps the first key of each block to its file offset, enabling binary search for point lookups.

Immutability

SSTables are never modified after creation. This provides:

  • Safe concurrent reads without locks
  • Crash safety (no partial updates)
  • Simple backup and replication (just copy files)

The manifest file tracks which SSTables are active; it’s atomically replaced during compaction.

Configuration

ParameterDefaultDescription
sstable_block_bytes16KBData block size
sstable_max_file_bytes64MBMaximum SSTable file size