Show HN: VectorVFS, your filesystem as a vector database
Concept and Scope
- Tool stores per-file embeddings as extended attributes (xattrs) / inode metadata, turning the filesystem itself into the “vector store.”
- No LLMs are involved; it uses local encoders (e.g., Meta’s Perception Encoder) to generate multimodal embeddings.
- Intended mainly for local, interactive search across a working set of files, not multi‑million‑document production workloads.
Performance and Indexing
- Current search is explicitly O(N): it walks the target directory tree, loads or computes embeddings, and does a linear similarity scan.
- Some see this as fine for modest N: filesystem search is currently poor, RAM‑contiguous vectors plus SIMD and multithreading can still be fast.
- Others argue that without an index or contiguous in‑RAM layout, it’s not suitable for production or very large corpora.
- Author mentions a planned mode: build an index in a first pass and keep it alive for subsequent queries, but not for tens of millions of files.
- Several commenters note that any “efficient” global search ultimately requires a separate index or meta‑DB, so “zero‑overhead indexing” is viewed skeptically.
xattrs vs External DB
- Pro‑xattr arguments:
- Embeddings travel with files (copy/move preserves metadata), making them the “source of truth.”
- An external indexer could asynchronously watch the FS and rebuild indices.
- Critiques:
- xattrs aren’t contiguous in RAM and can be slower than reading file headers.
- Not all filesystems/OSes support them consistently; many copy tools ignore them.
- If a proper index is needed anyway, some question the point of storing large vectors in xattrs.
Use Cases, Debuggability, and Extensions
- People imagine smart search like “video from last month camping with turkeys,” RAG setups, and better Finder‑style search.
- Concern raised about “opaque embeddings”: how to debug why a file did or didn’t match? One suggestion: store a human‑readable description xattr alongside the embedding.
- Ideas for extensions: optional embedded vector DB (Weaviate, FAISS) for scalable indexing; storing tags and other metadata similarly.
Implementation and Broader Filesystem/DB Debate
- Currently Linux‑only; supports CPU and NVIDIA GPU backends. macOS support is planned.
- Python chosen for rapid prototyping and rich ML libraries; Rust is seen as adding complexity without real speed benefit given model bottlenecks.
- Thread digresses into a deep debate: “filesystems as (or vs) databases,” microkernels, atomicity, networked APIs, BeFS/WinFS history, and whether richer FS‑level metadata and search should become standard.