2025-05-05

Show HN: VectorVFS, your filesystem as a vector database

Concept and Scope

Tool stores per-file embeddings as extended attributes (xattrs) / inode metadata, turning the filesystem itself into the “vector store.”
No LLMs are involved; it uses local encoders (e.g., Meta’s Perception Encoder) to generate multimodal embeddings.
Intended mainly for local, interactive search across a working set of files, not multi‑million‑document production workloads.

Performance and Indexing

Current search is explicitly O(N): it walks the target directory tree, loads or computes embeddings, and does a linear similarity scan.
Some see this as fine for modest N: filesystem search is currently poor, RAM‑contiguous vectors plus SIMD and multithreading can still be fast.
Others argue that without an index or contiguous in‑RAM layout, it’s not suitable for production or very large corpora.
Author mentions a planned mode: build an index in a first pass and keep it alive for subsequent queries, but not for tens of millions of files.
Several commenters note that any “efficient” global search ultimately requires a separate index or meta‑DB, so “zero‑overhead indexing” is viewed skeptically.

xattrs vs External DB

Pro‑xattr arguments:
- Embeddings travel with files (copy/move preserves metadata), making them the “source of truth.”
- An external indexer could asynchronously watch the FS and rebuild indices.
Critiques:
- xattrs aren’t contiguous in RAM and can be slower than reading file headers.
- Not all filesystems/OSes support them consistently; many copy tools ignore them.
- If a proper index is needed anyway, some question the point of storing large vectors in xattrs.

Use Cases, Debuggability, and Extensions

People imagine smart search like “video from last month camping with turkeys,” RAG setups, and better Finder‑style search.
Concern raised about “opaque embeddings”: how to debug why a file did or didn’t match? One suggestion: store a human‑readable description xattr alongside the embedding.
Ideas for extensions: optional embedded vector DB (Weaviate, FAISS) for scalable indexing; storing tags and other metadata similarly.

Implementation and Broader Filesystem/DB Debate

Currently Linux‑only; supports CPU and NVIDIA GPU backends. macOS support is planned.
Python chosen for rapid prototyping and rich ML libraries; Rust is seen as adding complexity without real speed benefit given model bottlenecks.
Thread digresses into a deep debate: “filesystems as (or vs) databases,” microkernels, atomicity, networked APIs, BeFS/WinFS history, and whether richer FS‑level metadata and search should become standard.

Related topics