SQLite: 35% Faster Than the Filesystem

SQLite as a filesystem (“sqlitefs”)

  • Several people ask why no widespread “sqlitefs” exists.
  • Concerns: putting SQLite in the kernel is complex; FUSE adds overhead that could erase performance gains.
  • There are existing experiments: FUSE-based SQLite filesystems, macOS implementations, a GitHub project that mounts a SQLite DB as a FS, and Proxmox using SQLite + FUSE for VM configs.
  • Skepticism: turning a simple embeddable library into a full filesystem adds versioning, compatibility, and maintenance issues without clear benefit.

Why SQLite can outperform a filesystem for blobs

  • Core reason cited: many blobs in separate files require many open/close calls; a single SQLite file needs only one open/close, reducing syscall and access-control overhead.
  • Single large file improves OS caching and read-ahead; OS can better predict the working set.
  • Databases don’t need to support all POSIX semantics (multi-user, complex permissions, symlinks, xattrs), so they avoid that overhead.
  • Commenters note this is “faster than using the FS in a naive way,” not magically faster than the underlying storage.

OS and filesystem effects (especially Windows/NTFS)

  • Multiple anecdotes: NTFS and Windows tooling perform poorly with many small files (e.g., node_modules, big builds), sometimes 10× slower than Linux on the same hardware.
  • Others report little difference in specific workloads, suggesting workload and tool design matter.
  • A major culprit identified is synchronous antivirus / filesystem filter drivers (e.g., Defender) that hook every open/close, not NTFS alone.
  • Windows “Dev Drive” (ReFS + async AV) improves performance, but opinions differ on how much.
  • Asynchronous I/O (io_uring/IOCP) mainly improves concurrency, not per-op latency; unclear how it changes this specific benchmark.

SQLite for logs and application data

  • Some use SQLite (often WAL/WAL2) instead of text logs: writes are fast, reads and analysis become much more powerful (SQL, indices).
  • Pushback: harder to use simple tools like cat, grep, tail; workarounds involve piping sqlite3 output or writing small utilities.
  • For very large logs, row-store + many indexes may hurt write performance and space; columnar systems (ClickHouse, DuckDB) are suggested as better fits.

Blobs, hierarchy, and hybrid designs

  • Storing large blobs in SQLite works but has drawbacks: 2 GB limit, serialization complexity, and weaker tooling integration than plain files.
  • Incremental blob I/O APIs exist but are somewhat awkward (e.g., need zeroblob, cannot resize).
  • Many end up with hybrids: metadata and paths in SQLite, actual file contents on the filesystem, sometimes with virtual “folder” structures modeled in tables.

Raw devices and broader perspective

  • Some databases (Oracle, MySQL) can bypass filesystems and write directly to block devices, yielding modest gains at significant complexity.
  • Overall theme: filesystems are highly general; databases can win by optimizing for narrower access patterns and avoiding per-file and per-syscall overhead, but trade away generality and tooling.