What Does a Database for SSDs Look Like?
Sequential vs random I/O on SSDs
- Several comments stress that SSDs still have a big performance gap between sequential and random I/O, just much smaller than on HDDs.
- Benchmarks show multi-GB/s sequential reads vs tens of MB/s for 4K random reads; latency is microseconds instead of milliseconds, but locality still matters.
- Controllers, filesystems and OS readahead are all tuned to reward large, aligned, predictable access patterns.
SSD internals and their impact
- SSDs expose small logical blocks (512B/4K), but internally have:
- programming pages (4–64K) and erase blocks (1–128 MiB);
- FTLs that look conceptually like an LSM with background compaction.
- Misaligned or tiny writes trigger read–modify–write and more NAND reads later; 128K-aligned writes can make random 128K reads as fast as sequential.
- Fragmentation on SSDs mostly hurts via request splitting, not seek costs.
WAL, batching, and durability
- One camp: WALs are still needed because host interfaces are block-based and median DB transactions modify only a few bytes. WALs provide durability and turn random page updates into sequential log writes.
- Another camp: WAL is primarily about durability; batching gains really come from log-structured / LSM designs, with checkpointing and group commit as refinements.
- Some note you can unify data and WAL via persistent data structures, getting cheap snapshots.
Distributed logs vs single-node commit
- The article’s stance (“commit-to-disk on one node is unnecessary; durability is via replicated log”) is heavily debated.
- Critics warn about correlated failures, software bugs, and cluster-wide crashes; they argue fsyncing to local SSDs remains valuable and often faster than a network round-trip.
- Defenders point to designs like Aurora’s multi-AZ quorum model and argue the failure probabilities can be made acceptable, but others insist on testing over paper guarantees.
Data structures and “DBs haven’t changed”
- Some claim DBs are stuck with B-trees/LSMs tuned for spinning disks and masking inefficiency with faster hardware.
- Others counter that plenty of innovation exists (e.g., LeanStore/Umbra, hybrid compacting B-trees, LSM variants), but the block-device interface constrains designs.
- Debate continues over B-trees vs LSMs vs hybrids: tradeoffs in write amplification, multithreaded updates, compaction overhead, and cache behavior on SSDs.
Wear and endurance
- Write endurance and write amplification remain major SSD concerns; LSM’s lower amplification is highlighted as a key advantage.
- Hyperscalers may care mainly about hitting 5‑year lifetimes, while smaller deployments might accept more wear or simply replace instances in the cloud.