What Does a Database for SSDs Look Like?

Sequential vs random I/O on SSDs

  • Several comments stress that SSDs still have a big performance gap between sequential and random I/O, just much smaller than on HDDs.
  • Benchmarks show multi-GB/s sequential reads vs tens of MB/s for 4K random reads; latency is microseconds instead of milliseconds, but locality still matters.
  • Controllers, filesystems and OS readahead are all tuned to reward large, aligned, predictable access patterns.

SSD internals and their impact

  • SSDs expose small logical blocks (512B/4K), but internally have:
    • programming pages (4–64K) and erase blocks (1–128 MiB);
    • FTLs that look conceptually like an LSM with background compaction.
  • Misaligned or tiny writes trigger read–modify–write and more NAND reads later; 128K-aligned writes can make random 128K reads as fast as sequential.
  • Fragmentation on SSDs mostly hurts via request splitting, not seek costs.

WAL, batching, and durability

  • One camp: WALs are still needed because host interfaces are block-based and median DB transactions modify only a few bytes. WALs provide durability and turn random page updates into sequential log writes.
  • Another camp: WAL is primarily about durability; batching gains really come from log-structured / LSM designs, with checkpointing and group commit as refinements.
  • Some note you can unify data and WAL via persistent data structures, getting cheap snapshots.

Distributed logs vs single-node commit

  • The article’s stance (“commit-to-disk on one node is unnecessary; durability is via replicated log”) is heavily debated.
  • Critics warn about correlated failures, software bugs, and cluster-wide crashes; they argue fsyncing to local SSDs remains valuable and often faster than a network round-trip.
  • Defenders point to designs like Aurora’s multi-AZ quorum model and argue the failure probabilities can be made acceptable, but others insist on testing over paper guarantees.

Data structures and “DBs haven’t changed”

  • Some claim DBs are stuck with B-trees/LSMs tuned for spinning disks and masking inefficiency with faster hardware.
  • Others counter that plenty of innovation exists (e.g., LeanStore/Umbra, hybrid compacting B-trees, LSM variants), but the block-device interface constrains designs.
  • Debate continues over B-trees vs LSMs vs hybrids: tradeoffs in write amplification, multithreaded updates, compaction overhead, and cache behavior on SSDs.

Wear and endurance

  • Write endurance and write amplification remain major SSD concerns; LSM’s lower amplification is highlighted as a key advantage.
  • Hyperscalers may care mainly about hitting 5‑year lifetimes, while smaller deployments might accept more wear or simply replace instances in the cloud.