SQLite: 35% Faster Than the Filesystem
SQLite as a filesystem (“sqlitefs”)
- Several people ask why no widespread “sqlitefs” exists.
- Concerns: putting SQLite in the kernel is complex; FUSE adds overhead that could erase performance gains.
- There are existing experiments: FUSE-based SQLite filesystems, macOS implementations, a GitHub project that mounts a SQLite DB as a FS, and Proxmox using SQLite + FUSE for VM configs.
- Skepticism: turning a simple embeddable library into a full filesystem adds versioning, compatibility, and maintenance issues without clear benefit.
Why SQLite can outperform a filesystem for blobs
- Core reason cited: many blobs in separate files require many
open/closecalls; a single SQLite file needs only oneopen/close, reducing syscall and access-control overhead. - Single large file improves OS caching and read-ahead; OS can better predict the working set.
- Databases don’t need to support all POSIX semantics (multi-user, complex permissions, symlinks, xattrs), so they avoid that overhead.
- Commenters note this is “faster than using the FS in a naive way,” not magically faster than the underlying storage.
OS and filesystem effects (especially Windows/NTFS)
- Multiple anecdotes: NTFS and Windows tooling perform poorly with many small files (e.g.,
node_modules, big builds), sometimes 10× slower than Linux on the same hardware. - Others report little difference in specific workloads, suggesting workload and tool design matter.
- A major culprit identified is synchronous antivirus / filesystem filter drivers (e.g., Defender) that hook every
open/close, not NTFS alone. - Windows “Dev Drive” (ReFS + async AV) improves performance, but opinions differ on how much.
- Asynchronous I/O (io_uring/IOCP) mainly improves concurrency, not per-op latency; unclear how it changes this specific benchmark.
SQLite for logs and application data
- Some use SQLite (often WAL/WAL2) instead of text logs: writes are fast, reads and analysis become much more powerful (SQL, indices).
- Pushback: harder to use simple tools like
cat,grep,tail; workarounds involve pipingsqlite3output or writing small utilities. - For very large logs, row-store + many indexes may hurt write performance and space; columnar systems (ClickHouse, DuckDB) are suggested as better fits.
Blobs, hierarchy, and hybrid designs
- Storing large blobs in SQLite works but has drawbacks: 2 GB limit, serialization complexity, and weaker tooling integration than plain files.
- Incremental blob I/O APIs exist but are somewhat awkward (e.g., need
zeroblob, cannot resize). - Many end up with hybrids: metadata and paths in SQLite, actual file contents on the filesystem, sometimes with virtual “folder” structures modeled in tables.
Raw devices and broader perspective
- Some databases (Oracle, MySQL) can bypass filesystems and write directly to block devices, yielding modest gains at significant complexity.
- Overall theme: filesystems are highly general; databases can win by optimizing for narrower access patterns and avoiding per-file and per-syscall overhead, but trade away generality and tooling.