Compression

Writes: Small overhead (~2% CPU) for compression during flush
Reads: Negligible — LZ4 decompresses at 4.4 GB/s
Disk I/O: Significant reduction — fewer bytes to read from disk
Cache efficiency: Compressed data means more data fits in cache

TensorDB uses LZ4 block compression to reduce SSTable disk usage while maintaining fast read performance.

LZ4

LZ4 is chosen for its excellent decompression speed:

Metric	LZ4	zstd	gzip
Compression speed	~780 MB/s	~500 MB/s	~90 MB/s
Decompression speed	~4400 MB/s	~1600 MB/s	~400 MB/s
Compression ratio	~2.1×	~2.9×	~3.1×

LZ4 prioritizes speed over compression ratio — ideal for a database where decompression latency matters.

Compression operates at the SSTable block level (default: 16KB blocks):

Block-level compression allows random access — you don’t need to decompress the entire file.

Typical compression ratios for TensorDB data:

Data Type	Ratio	Explanation
JSON/text	2–4×	High redundancy in JSON keys and text
Numeric data	1.5–2×	Less compressible
Binary/blobs	1.1–1.5×	Already compact
Mixed workloads	~2×	Typical overall savings