PSA: SQLite WAL checksums fail silently and may lose data
Purpose of SQLite WAL checksums
- Several commenters argue WAL checksums are primarily to detect partial / out‑of‑order writes, not arbitrary corruption or “bit rot.”
- Checksums let SQLite know when a transaction was fully written and fsynced so it can safely treat it as committed and distinguish complete vs incomplete journal entries.
- SQLite uses checksums on both rollback journal and WAL; database file pages themselves are not checksummed by default (except via optional VFS extensions).
Why truncating at first bad frame is seen as correct
- Once a checksum fails, all subsequent frames are untrustworthy, because:
- The WAL describes page‑level B‑tree changes; skipping frames can break structural invariants and cause deeper corruption.
- Later frames may depend on earlier ones (e.g., page splits, pointer rewrites), so applying them without the missing changes is unsafe.
- Chaining checksums means corruption in the middle is treated like a “rockslide”: replay up to the obstruction, then stop; this leaves the DB in a state that definitely existed.
- Many see this not as “data loss,” but as rolling back untrustworthy transactions to the last known‑good state.
Disagreement over “silent” behavior and recovery options
- The article’s author and some commenters want:
- Errors raised on checksum failure.
- An option to preserve the WAL for manual / tool‑driven recovery or partial salvage in special cases (e.g., corrupted frames that aren’t logically needed).
- Others counter that:
- Most applications just want a consistent DB that opens; refusing to start on WAL issues would be worse.
- Attempts at partial application or heuristic recovery would be arbitrary and dangerous for transactional semantics.
Checksums, algorithms, and corruption scope
- Debate over Fletcher vs CRC:
- CRCs can sometimes correct 1–2 bit errors and have stronger detection; Fletcher is seen as weaker and somewhat outdated.
- Counter‑argument: for SQLite’s purpose (detecting partial writes), Fletcher is sufficient; full anti‑bit‑rot protection would also require page checksums on the main DB.
- Some note real corruption can occur in transit (controller, bus, memory) despite SATA/NVMe ECC, though rates and seriousness are debated.
Filesystem and storage assumptions
- SQLite assumes reasonably correct fsync semantics; if fsync “lies” or the filesystem silently corrupts data, SQLite cannot fully defend against that.
- Discussion of ZFS, ext4, btrfs:
- ZFS’s strong checksumming is praised, but its interaction with SQLite fsync and txg timeouts can cause hangs unless tuned (e.g.,
sync=disabled), which then risks data loss. - Many databases and common filesystems still use relatively weak 16–32‑bit checksums, often disabled or metadata‑only.
- ZFS’s strong checksumming is praised, but its interaction with SQLite fsync and txg timeouts can cause hangs unless tuned (e.g.,
Critiques of the article’s framing
- Several commenters think calling this a “PSA” about data loss is misleading:
- The behavior is intentional, documented, and primarily about crash safety, not end‑to‑end corruption detection.
- The simulated WAL‑only corruption scenario is considered contrived compared to broader risks (main DB corruption, full sector loss, missing .db‑shm, etc.).
- Some also object to implying SQLite is hard to contribute to or “limping along,” noting that corruption‑detection features (checksum VFS) and clear design assumptions already exist.