Why does C have the best file API

What counts as C’s “file API”

  • Several commenters note the article conflates C, POSIX, and the OS:
    • mmap and most low-level calls are POSIX syscalls, not part of ISO C.
    • C’s standard file API is fopen/fread/fwrite/fclose, which many consider mediocre and incomplete (no paths, no dialogs, poor strings).
  • Some argue the platform API (POSIX/Unix) is what’s “good,” and it just happens to be exposed most naturally in C due to Unix/C co-evolution.

mmap as an OS feature, not a C feature

  • Strong agreement that mmap is an operating system capability:
    • Available from many languages (Python, Perl, Java, C#, Go via libraries, etc.).
    • Exists on non-C OSes and in systems with “single-level store” designs.
  • Moral: mmap “belongs to the platform,” C is just the first-class interface on Unix-like systems.

Benefits of mmap-style file access

  • Treating a file as memory reduces boilerplate vs. manual read/parse/write loops.
  • Works efficiently when files are larger than RAM by leveraging paging.
  • Avoids duplicating file data into anonymous memory; better under memory pressure.
  • Useful for:
    • Large, mostly immutable local files.
    • Shared memory between processes.
    • Many processes accessing the same data subset.
    • Zero-copy patterns and database-like engines (also used under the hood by loaders).

Pitfalls and error handling issues

  • I/O errors on mmapped regions surface as signals (e.g., SIGBUS), which:
    • Are asynchronous, hard to handle safely, and can occur deep in the call stack.
    • Lead many systems to simply crash and restart on such failures.
  • Fragile with:
    • Network/WiFi/USB drives and removable media.
    • Files modified or truncated concurrently, yielding mixed or invalid views.
  • Performance is nuanced:
    • Page faults, TLB pressure, lack of huge pages can hurt.
    • Some report mmap faster than buffered reads; others prefer deliberate async I/O (io_uring, event loops).

Struct-as-binary-format: powerful but dangerous

  • C’s ability to reinterpret mmapped bytes as typed structs is praised as very convenient.
  • Many argue it’s a “terrible idea” for general use:
    • Depends on ABI details: padding, alignment, endianness, type sizes.
    • Non-portable across architectures and compiler versions.
    • Hard to evolve formats or enforce invariants; easy to introduce UB.
  • Others counter that, in practice, for single-platform or per-platform-built data, it can be simple and fast and is widely used in games and similar domains.

Alternatives and higher-level abstractions

  • High-level serialization and columnar formats (protobuf, Cap’n Proto, FlatBuffers, Parquet, Arrow, SQLite, LMDB) are cited as safer, more portable choices for structured data.
  • Some prefer stream abstractions (e.g., Smalltalk-style streams) or databases instead of rolling custom binary file formats.
  • There is broad skepticism that C (or mmap) universally has the “best” file API; whether it’s “best” depends heavily on safety vs. control trade-offs and specific use cases.