2026-01-14

Ask HN: How are you doing RAG locally?

Tools and Local Setups

Many people use turnkey or semi-turnkey tools: LibreChat (with built-in vector DB), AnythingLLM, Kiln, Haiku.rag, Libragen, Nextcloud MCP server + Qdrant, discovery, etc.
Common DIY stacks:
- Ollama or llama.cpp for local models
- Chroma, Qdrant, LanceDB, Milvus, SQLite + sqlite-vec / FTS5 / vec0, DuckDB VSS, Postgres + pgvector/ParadeDB, USearch, FAISS (CPU/GPU).
Several homegrown libraries and CLIs were shared (piragi, llmemory, lifepim-ai-core, ragtune, SearchArray, pdfgptindexer-offline, seance, qmd, ck, local-LLM-with-RAG).

Vector Search vs BM25/Keyword Search

Strong thread arguing BM25/TF‑IDF + n‑grams/trigrams often beat or match embeddings, especially for code and traditional IR.
For code search, multiple people recommend BM25 + trigram (or ripgrep) over vectors; embeddings can be slow, noisy, and require careful models and rerankers.
Others report excellent results from static/fast embedding models and hybrid search (BM25 + vectors + reranking) for both code and natural language.
SQLite FTS5, Postgres BM25 extensions, Meilisearch, Manticore, Typesense, and Elasticsearch/OpenSearch are mentioned for sparse/hybrid search.

RAG Without “Heavy” Infra

Several setups avoid vector DBs entirely: just full-text search over markdown, or agentic retrieval over filesystem/web APIs.
Clarification that RAG is about retrieval-augmented generation in general; a vectordb is optional.
Some use large context windows (e.g., 1M tokens) to “just stuff everything in,” but others still prefer RAG for efficiency and control.

Scaling, Performance, and Hardware

FAISS noted as RAM-bound; once data doesn’t fit in memory, it’s the wrong tool.
Experiences shared with large embeddings sets (millions of chunks, tens of GB RAM); FAISS GPU vs CPU tradeoffs.
DuckDB and Qdrant discussed for both small local projects and potential TB-scale use, with advice to benchmark for specific rigs.
One user offloads RAG to hosted/vector DB due to local slowdown on an M1 Pro.

Use Cases and Challenges

Use cases: internal company chatbots, personal knowledge bases, RSS feeds, academic PDFs, datasheets, transactional/financial records, Zotero and Excel analysis, Claude Code memory.
Chunking and document structure (tables, multi-column PDFs, financial records) are repeatedly cited as bigger practical challenges than model choice.
Some are exploring graph-based/KAG approaches atop RAG for higher-level reasoning and traceability across complex systems.

Related topics