Ask HN: How are you doing RAG locally?
Tools and Local Setups
- Many people use turnkey or semi-turnkey tools: LibreChat (with built-in vector DB), AnythingLLM, Kiln, Haiku.rag, Libragen, Nextcloud MCP server + Qdrant, discovery, etc.
- Common DIY stacks:
- Ollama or llama.cpp for local models
- Chroma, Qdrant, LanceDB, Milvus, SQLite + sqlite-vec / FTS5 / vec0, DuckDB VSS, Postgres + pgvector/ParadeDB, USearch, FAISS (CPU/GPU).
- Several homegrown libraries and CLIs were shared (piragi, llmemory, lifepim-ai-core, ragtune, SearchArray, pdfgptindexer-offline, seance, qmd, ck, local-LLM-with-RAG).
Vector Search vs BM25/Keyword Search
- Strong thread arguing BM25/TF‑IDF + n‑grams/trigrams often beat or match embeddings, especially for code and traditional IR.
- For code search, multiple people recommend BM25 + trigram (or ripgrep) over vectors; embeddings can be slow, noisy, and require careful models and rerankers.
- Others report excellent results from static/fast embedding models and hybrid search (BM25 + vectors + reranking) for both code and natural language.
- SQLite FTS5, Postgres BM25 extensions, Meilisearch, Manticore, Typesense, and Elasticsearch/OpenSearch are mentioned for sparse/hybrid search.
RAG Without “Heavy” Infra
- Several setups avoid vector DBs entirely: just full-text search over markdown, or agentic retrieval over filesystem/web APIs.
- Clarification that RAG is about retrieval-augmented generation in general; a vectordb is optional.
- Some use large context windows (e.g., 1M tokens) to “just stuff everything in,” but others still prefer RAG for efficiency and control.
Scaling, Performance, and Hardware
- FAISS noted as RAM-bound; once data doesn’t fit in memory, it’s the wrong tool.
- Experiences shared with large embeddings sets (millions of chunks, tens of GB RAM); FAISS GPU vs CPU tradeoffs.
- DuckDB and Qdrant discussed for both small local projects and potential TB-scale use, with advice to benchmark for specific rigs.
- One user offloads RAG to hosted/vector DB due to local slowdown on an M1 Pro.
Use Cases and Challenges
- Use cases: internal company chatbots, personal knowledge bases, RSS feeds, academic PDFs, datasheets, transactional/financial records, Zotero and Excel analysis, Claude Code memory.
- Chunking and document structure (tables, multi-column PDFs, financial records) are repeatedly cited as bigger practical challenges than model choice.
- Some are exploring graph-based/KAG approaches atop RAG for higher-level reasoning and traceability across complex systems.