Show HN: VectorVFS, your filesystem as a vector database

Concept and Scope

  • Tool stores per-file embeddings as extended attributes (xattrs) / inode metadata, turning the filesystem itself into the “vector store.”
  • No LLMs are involved; it uses local encoders (e.g., Meta’s Perception Encoder) to generate multimodal embeddings.
  • Intended mainly for local, interactive search across a working set of files, not multi‑million‑document production workloads.

Performance and Indexing

  • Current search is explicitly O(N): it walks the target directory tree, loads or computes embeddings, and does a linear similarity scan.
  • Some see this as fine for modest N: filesystem search is currently poor, RAM‑contiguous vectors plus SIMD and multithreading can still be fast.
  • Others argue that without an index or contiguous in‑RAM layout, it’s not suitable for production or very large corpora.
  • Author mentions a planned mode: build an index in a first pass and keep it alive for subsequent queries, but not for tens of millions of files.
  • Several commenters note that any “efficient” global search ultimately requires a separate index or meta‑DB, so “zero‑overhead indexing” is viewed skeptically.

xattrs vs External DB

  • Pro‑xattr arguments:
    • Embeddings travel with files (copy/move preserves metadata), making them the “source of truth.”
    • An external indexer could asynchronously watch the FS and rebuild indices.
  • Critiques:
    • xattrs aren’t contiguous in RAM and can be slower than reading file headers.
    • Not all filesystems/OSes support them consistently; many copy tools ignore them.
    • If a proper index is needed anyway, some question the point of storing large vectors in xattrs.

Use Cases, Debuggability, and Extensions

  • People imagine smart search like “video from last month camping with turkeys,” RAG setups, and better Finder‑style search.
  • Concern raised about “opaque embeddings”: how to debug why a file did or didn’t match? One suggestion: store a human‑readable description xattr alongside the embedding.
  • Ideas for extensions: optional embedded vector DB (Weaviate, FAISS) for scalable indexing; storing tags and other metadata similarly.

Implementation and Broader Filesystem/DB Debate

  • Currently Linux‑only; supports CPU and NVIDIA GPU backends. macOS support is planned.
  • Python chosen for rapid prototyping and rich ML libraries; Rust is seen as adding complexity without real speed benefit given model bottlenecks.
  • Thread digresses into a deep debate: “filesystems as (or vs) databases,” microkernels, atomicity, networked APIs, BeFS/WinFS history, and whether richer FS‑level metadata and search should become standard.