Why does C have the best file API
What counts as C’s “file API”
- Several commenters note the article conflates C, POSIX, and the OS:
mmapand most low-level calls are POSIX syscalls, not part of ISO C.- C’s standard file API is
fopen/fread/fwrite/fclose, which many consider mediocre and incomplete (no paths, no dialogs, poor strings).
- Some argue the platform API (POSIX/Unix) is what’s “good,” and it just happens to be exposed most naturally in C due to Unix/C co-evolution.
mmap as an OS feature, not a C feature
- Strong agreement that
mmapis an operating system capability:- Available from many languages (Python, Perl, Java, C#, Go via libraries, etc.).
- Exists on non-C OSes and in systems with “single-level store” designs.
- Moral: mmap “belongs to the platform,” C is just the first-class interface on Unix-like systems.
Benefits of mmap-style file access
- Treating a file as memory reduces boilerplate vs. manual read/parse/write loops.
- Works efficiently when files are larger than RAM by leveraging paging.
- Avoids duplicating file data into anonymous memory; better under memory pressure.
- Useful for:
- Large, mostly immutable local files.
- Shared memory between processes.
- Many processes accessing the same data subset.
- Zero-copy patterns and database-like engines (also used under the hood by loaders).
Pitfalls and error handling issues
- I/O errors on mmapped regions surface as signals (e.g., SIGBUS), which:
- Are asynchronous, hard to handle safely, and can occur deep in the call stack.
- Lead many systems to simply crash and restart on such failures.
- Fragile with:
- Network/WiFi/USB drives and removable media.
- Files modified or truncated concurrently, yielding mixed or invalid views.
- Performance is nuanced:
- Page faults, TLB pressure, lack of huge pages can hurt.
- Some report
mmapfaster than buffered reads; others prefer deliberate async I/O (io_uring, event loops).
Struct-as-binary-format: powerful but dangerous
- C’s ability to reinterpret mmapped bytes as typed structs is praised as very convenient.
- Many argue it’s a “terrible idea” for general use:
- Depends on ABI details: padding, alignment, endianness, type sizes.
- Non-portable across architectures and compiler versions.
- Hard to evolve formats or enforce invariants; easy to introduce UB.
- Others counter that, in practice, for single-platform or per-platform-built data, it can be simple and fast and is widely used in games and similar domains.
Alternatives and higher-level abstractions
- High-level serialization and columnar formats (protobuf, Cap’n Proto, FlatBuffers, Parquet, Arrow, SQLite, LMDB) are cited as safer, more portable choices for structured data.
- Some prefer stream abstractions (e.g., Smalltalk-style streams) or databases instead of rolling custom binary file formats.
- There is broad skepticism that C (or mmap) universally has the “best” file API; whether it’s “best” depends heavily on safety vs. control trade-offs and specific use cases.