The RAG Obituary: Killed by agents, buried by context windows

Perception of the article and AI-generated prose

  • Several commenters feel the piece reads like LLM-written “slop”: overly chipper tone, repetitive structure, weak technical depth.
  • Some see it as a stealth ad (early company mention, promotional framing), others argue it’s unusually self-critical for marketing and contains useful lessons.
  • There’s pushback against derailing threads into “was this AI-written?” debates; moderators emphasize flagging/silent reporting rather than accusations in-thread.

What “RAG” actually means

  • A major thread is definitional: some use “RAG” narrowly as “embeddings + vector DB + chunking + top‑K + reranking.”
  • Others insist RAG is a general pattern: any retrieval (BM25, SQL, grep, APIs, tools) that augments LLM generation.
  • This leads to disagreement over claims like “grep isn’t RAG” vs. “grep + LLM is just primitive RAG.”

Grep / agentic search vs traditional RAG

  • Pro‑agentic/grep side:

    • Larger context windows let models read full files/docs; a simple grep/ripgrep loop plus iterative querying often beats complex pipelines for code and some document sets.
    • Agents can chain multiple searches, refine terms, follow references, and write notes/markdown “memory” files, approximating how humans work.
    • Traditional RAG pipelines (chunking, embeddings, vector DBs, rerankers) are fiddly, brittle, and expensive to build and maintain, especially with permissions.
  • Skeptical side:

    • Grep fails on synonyms, paraphrases, and vocabulary mismatch—exactly where embeddings shine.
    • Codebases are a best-case corpus; unstructured enterprise text, regulations, and huge tenders require semantic retrieval and ranking.
    • “Agentic search” typically includes RAG components (hybrid search, embeddings, rerankers); it’s more like “RAG inside a loop” than a replacement.

Scale, cost, and context windows

  • Commenters stress scaling limits: millions of docs, trillion-token corpora, or billion-token tenders can’t just be “thrown into context,” even with 1–10M token windows.
  • Context rot, latency, and cost remain hard constraints; embeddings and rerankers are still valuable for narrowing from millions to dozens.
  • Some argue LLM costs are trending down; others note energy costs, capex, and lack of profitability mean “near zero” is unlikely for cutting-edge models.

Consensus-ish views

  • Top‑K/vector‑only RAG is increasingly inadequate on its own.
  • Future systems will blend: agentic workflows, multiple retrieval tools (including grep), hybrid/graph RAG, and smarter orchestration—retrieval isn’t dead, but its role is changing.