Bloom Filters
Bloom filters are probabilistic data structures that test whether a key might be in an SSTable, enabling fast negative lookups.
How It Works
- When an SSTable is created, every key is hashed into the bloom filter
- On a point lookup, the bloom filter is checked before reading any data blocks
- Negative result: The key is definitely NOT in this SSTable → skip it
- Positive result: The key MIGHT be in this SSTable → read the data blocks
False Positive Rate
With the default 10 bits per key:
- False positive rate: ~1%
- Memory cost: ~1.25 bytes per key
| Bits per Key | False Positive Rate | Memory per 1M Keys |
|---|---|---|
| 4 | ~5.6% | 0.5 MB |
| 8 | ~2.2% | 1.0 MB |
| 10 | ~1.0% | 1.25 MB |
| 14 | ~0.2% | 1.75 MB |
| 20 | ~0.01% | 2.5 MB |
Impact on Read Performance
Bloom filters are critical for read performance:
- Without bloom filters: Every SSTable must be checked for every read
- With bloom filters: ~99% of SSTables are skipped per read
Configuration
| Parameter | Default | Description |
|---|---|---|
bloom_bits_per_key | 10 | Bits per key (minimum: 4) |