Working with Files Is Hard (2019)

POSIX filesystem APIs and why they’re “hard”

  • Research referenced in the thread shows many prominent systems (DBs, VCSs, distributed systems) misuse file APIs, even with expert developers.
  • Many argue the core problem is the POSIX model: old, entrenched, and underspecified on key semantics (ordering, atomicity, error propagation).
  • Others counter that APIs can’t be “impossible to misuse” and that many apps reasonably assume simpler conditions (e.g., single-writer).
  • Some see this as a “Worse is Better” outcome: cheap-to-implement semantics outcompeted safer designs.

Alternative abstractions and atomicity models

  • Several proposals: whole-file atomic writes via copy-on-write, atomic appends, treating files as atomic block maps, or transactional/DB-like semantics at the filesystem level.
  • Advocates claim this would remove large bug classes and simplify reasoning about shared files.
  • Critics raise concerns: multi‑GB files, extra space for copy-on-write, SSD wear, multi-process access, and difficulty retrofitting existing software and filesystems.
  • There’s discussion of database-style transactions (and deadlocks), with suggestions that MVCC-like approaches could mitigate some issues.

Barriers, fsync, and storage hardware behavior

  • Debate over why Linux still lacks a non-flushing barrier syscall to separate ordering from durability; some think it would significantly help databases.
  • Others note a prototype exists in research code but hasn’t been adopted, possibly due to limited benefit, SSD-era tradeoffs, or maintenance burden.
  • NVMe, FUA, and controller caches complicate “flush” guarantees; buggy hardware and lack of proper FUA support are cited.
  • It’s emphasized that some devices can lose or corrupt data even after flush, and that sector-atomic assumptions are not universally valid (e.g., certain non-volatile memories, commodity flash).

Windows, C libraries, and API evolution

  • Windows file APIs are described as somewhat safer/clearer but slower, with features like IOCP and strict locking on executables.
  • Lack of open research on Windows filesystems is attributed to NDAs and corporate control over publication.
  • An analogy is drawn to unsafe C standard functions: attempts to “stage in” safer alternatives are messy, non-portable, and often misunderstood.

Databases, SQLite, and failure handling

  • SQLite is praised as a safer choice when persisting local state, especially in specific modes (e.g., WAL, strict synchronous settings).
  • Later research simulating fsync errors found that major systems (Redis, SQLite, LevelDB, LMDB, PostgreSQL) still mishandle some failure modes.
  • Some systems deliberately rely on de facto hardware guarantees (sector-atomic writes), which may fail on certain devices.

NFS and distributed semantics

  • NFS is criticized for breaking important file guarantees (append, exclusive create, sync flags, locks, inotify), especially across UID mappings.
  • This leads to surprising behaviors such as read access changing after a successful open, complicating userland code.

Filesystem-specific behaviors and reliability

  • ext4 has special logic to make common “rename for atomic replace” patterns safer.
  • ZFS is discussed as robust but with Linux-specific issues under heavy load that may involve IO schedulers and external factors; there’s ongoing debugging.
  • Some report more corruption with modern filesystems than FAT; others stress that power loss and hardware flaws are fundamental and must be engineered around, not merely “fixed” operationally.

Meta observations

  • Many note that filesystems and storage “mostly work” until rare, harsh failure conditions.
  • There’s tension between accepting imperfect semantics for 95% of use cases and demanding stronger guarantees for critical systems.