2025-10-23

How memory maps (mmap) deliver faster file access in Go

When mmap is faster (and when it isn’t)

mmap can eliminate copies (kernel → user buffer → app buffer), giving big wins for read‑heavy, random access workloads (e.g., CDB, custom DBs, large matrices, append‑only logs).
For already-cached data, mmap can be as fast as a memcpy from RAM, while read/pread adds syscall overhead plus copies.
However, for typical sequential reads into a buffer, buffered I/O (fread/read/pread) and modern async mechanisms (io_uring) are often as fast or faster, with better readahead behavior.
Multiple commenters stress mmap is “sometimes” faster, not a general replacement for read/write.

Interaction with Go’s runtime and page faults

With mmap, a page fault blocks the OS thread running that goroutine; the Go scheduler can’t run another goroutine on that M/P while the fault is serviced.
With explicit file I/O, the goroutine blocks in a syscall and the runtime can schedule other goroutines, giving better utilization under I/O latency.
There is currently no cheap, ergonomic OS/API model for “async page faults”; proposals involving mincore, madvise, userfaultfd, signals, or rseq-style schemes are seen as complex and/or slow.

Benchmark design and the “25x faster” claim

Several commenters say the showcased benchmark is flawed: the mmap version often doesn’t actually touch data, only returns slices, so it measures “getting a pointer” vs “copying data”.
To be fair, both paths should read or copy the bytes and page cache should be controlled (e.g., drop_caches or touch pages).
With realistic access (actually reading bytes), a 25x gap is considered unlikely; more like small constant-factor differences depending on call size and access pattern.

APIs, OS design, and alternatives

Unix historically standardized on read/write to work uniformly across files, ttys, pipes, etc.; mmap arrived later and has different semantics (e.g., mapping can change under you, SIGBUS on shrink).
io_uring is repeatedly cited as a better modern primitive for high‑performance I/O: async, controllable readahead, zero copies with the right setup, and no hidden page-fault stalls.
Some argue OS-level abstractions like scheduler activations or newer proposals (UMCG) could better integrate user scheduling with page faults, but these are not widely available today.

Pitfalls and gotchas

Mappings outlive file descriptors; truncating or shrinking a mapped file can cause SIGBUS on access and unspecified behavior.
mmap allocations don’t show up clearly in Go’s pprof heap reports, making memory pressure/debugging harder.
Writes via mmap are tricky; in-place random writes can be problematic, though append-only patterns can work well.
Some filesystems or drivers can misbehave with mmap (e.g., reports of issues on macOS ExFAT with SQLite WAL), though the exact root causes are debated.

Real-world usage patterns

Positive reports: custom DBs, key–value stores, and log-structured systems see large gains from mmap, especially for random reads and read‑mostly workloads that fit in RAM.
Negative/skeptical reports: for typical apps doing buffered sequential reads or needing robust concurrency semantics, mmap adds complexity and hidden latency points without dramatic wins.

Context around Varnish

There’s discussion clarifying that “Varnish Cache” today has both a corporate fork and another version (renamed Vinyl Cache), and that the company behind the blog post has long funded and maintained the codebase rather than it being a one‑person effort.

Related topics