2024-05-21

I want flexible queries, not RAG

Natural language interface vs. generation

Many commenters agree the standout capability is natural-language querying; generation often feels like a flashy but less reliable add-on.
Some argue generation is still highly valuable, especially when embedded in workflows and processes rather than used “one-shot.”
There’s debate over whether understanding and generation can be cleanly separated in current LLM architectures; several say they’re inherently intertwined.

What RAG is (and isn’t)

Multiple people note the article’s example isn’t really RAG; it’s just using a generic chatbot.
Several definitions surface:
- Retrieval-first: encode query, retrieve relevant documents (often via vectors), then let the LLM summarize.
- More general: any system that retrieves external info and injects it into the prompt for in‑context use.
Some emphasize RAG’s main value is accessing data not seen in training, not “fixing hallucinations,” though others say it does reduce hallucinations when carefully prompted and cited.

Search, embeddings, and “flexible queries”

Many see the desired behavior as sophisticated semantic search with a natural-language front-end, sometimes without any summarization step.
Vector search/embeddings are praised but also described as overhyped and very sensitive to tuning and chunking.
Several point out that good RAG devolves into building a good search engine: query construction, indexing strategy, and document quality dominate results.

Reliability, hallucinations, and truth

Strong consensus that LLMs are poor as sources of factual truth but powerful as “calculators for words” or reasoning/summarization engines when provided with authoritative context.
Techniques mentioned to reduce hallucinations: explicit permission to say “I don’t know,” strict instructions to use only provided context, citations with chunk IDs, and post‑checks against context.

Structured output and tooling

Experiences with JSON/schema adherence are mixed: some report thousands of reliable outputs; others report frequent schema drift without constraints and validation.
Grammar-constrained decoding and validation‑and‑repair loops are described as standard practice in production pipelines.

Use cases and expectations

Some see user complaints as evidence of unrealistic expectations—people implicitly ask for near‑AGI and are disappointed.
Others stress that the core problem is that outputs are optimized to look plausible, not to be verifiably correct, which conflicts with how many users understand “answers.”

Related topics