How memory maps (mmap) deliver faster file access in Go
When mmap is faster (and when it isn’t)
- mmap can eliminate copies (kernel → user buffer → app buffer), giving big wins for read‑heavy, random access workloads (e.g., CDB, custom DBs, large matrices, append‑only logs).
- For already-cached data, mmap can be as fast as a memcpy from RAM, while read/pread adds syscall overhead plus copies.
- However, for typical sequential reads into a buffer, buffered I/O (fread/read/pread) and modern async mechanisms (io_uring) are often as fast or faster, with better readahead behavior.
- Multiple commenters stress mmap is “sometimes” faster, not a general replacement for read/write.
Interaction with Go’s runtime and page faults
- With mmap, a page fault blocks the OS thread running that goroutine; the Go scheduler can’t run another goroutine on that M/P while the fault is serviced.
- With explicit file I/O, the goroutine blocks in a syscall and the runtime can schedule other goroutines, giving better utilization under I/O latency.
- There is currently no cheap, ergonomic OS/API model for “async page faults”; proposals involving mincore, madvise, userfaultfd, signals, or rseq-style schemes are seen as complex and/or slow.
Benchmark design and the “25x faster” claim
- Several commenters say the showcased benchmark is flawed: the mmap version often doesn’t actually touch data, only returns slices, so it measures “getting a pointer” vs “copying data”.
- To be fair, both paths should read or copy the bytes and page cache should be controlled (e.g., drop_caches or touch pages).
- With realistic access (actually reading bytes), a 25x gap is considered unlikely; more like small constant-factor differences depending on call size and access pattern.
APIs, OS design, and alternatives
- Unix historically standardized on read/write to work uniformly across files, ttys, pipes, etc.; mmap arrived later and has different semantics (e.g., mapping can change under you, SIGBUS on shrink).
- io_uring is repeatedly cited as a better modern primitive for high‑performance I/O: async, controllable readahead, zero copies with the right setup, and no hidden page-fault stalls.
- Some argue OS-level abstractions like scheduler activations or newer proposals (UMCG) could better integrate user scheduling with page faults, but these are not widely available today.
Pitfalls and gotchas
- Mappings outlive file descriptors; truncating or shrinking a mapped file can cause SIGBUS on access and unspecified behavior.
- mmap allocations don’t show up clearly in Go’s pprof heap reports, making memory pressure/debugging harder.
- Writes via mmap are tricky; in-place random writes can be problematic, though append-only patterns can work well.
- Some filesystems or drivers can misbehave with mmap (e.g., reports of issues on macOS ExFAT with SQLite WAL), though the exact root causes are debated.
Real-world usage patterns
- Positive reports: custom DBs, key–value stores, and log-structured systems see large gains from mmap, especially for random reads and read‑mostly workloads that fit in RAM.
- Negative/skeptical reports: for typical apps doing buffered sequential reads or needing robust concurrency semantics, mmap adds complexity and hidden latency points without dramatic wins.
Context around Varnish
- There’s discussion clarifying that “Varnish Cache” today has both a corporate fork and another version (renamed Vinyl Cache), and that the company behind the blog post has long funded and maintained the codebase rather than it being a one‑person effort.