Why language models hallucinate
Evaluation, Multiple-Choice Analogies, and Incentives
- Many comments pick up on the article’s multiple-choice test analogy: current benchmarks reward “getting it right” but don’t penalize confident wrong answers, so models are implicitly trained to guess rather than say “I don’t know.”
- Some compare this to standardized tests with negative marking or partial credit for blank answers, arguing evals should similarly penalize confident errors and allow abstention.
- Others note this is hard to implement technically at scale: answers aren’t one token, synonyms and formatting complicate what counts as “wrong,” and transformer training doesn’t trivially support “negative points” for incorrect generations.
What Counts as a Hallucination?
- One camp insists “all an LLM does is hallucinate”: everything is probabilistic next-token generation, and some outputs just happen to be true or useful.
- Another camp adopts the article’s narrower definition: hallucinations are plausible but false statements; not all generations qualify. Under this view, the term is only useful if it distinguishes wrong factual assertions from correct ones.
- There’s pushback that “hallucination” is anthropomorphic marketing; alternatives like “confabulation” or simply “prediction error” are suggested.
Root Causes and Architectural Limits
- Several comments reiterate the paper’s argument: next-word prediction on noisy, incomplete data inevitably leads to errors, especially for low-frequency or effectively random facts (like birthdays).
- Others argue the deeper problem is lack of grounding and metacognition: models don’t truly know what they know, can’t access their own “knowledge boundaries,” and separate training from inference, unlike humans who continuously learn and track uncertainty.
- Some see hallucinations as an inherent byproduct of large lossy models compressing the world; with finite capacity and imperfect data, there will always be gaps.
Can Hallucinations Be Reduced or Avoided?
- Many are positive about training models to express uncertainty or abstain (“I don’t know/I’m unsure”), but question how well uncertainty can be calibrated in practice.
- There’s broad agreement that you can build non‑hallucinating narrow systems (e.g., fixed QA databases + calculators) that say IDK outside their domain; disagreement is whether general LLMs can approach that behavior.
- Multiple commenters note a precision–recall tradeoff: fewer hallucinations means more refusals and less user appeal; current business incentives and leaderboards push vendors toward “always answer,” encouraging hallucinations.
Broader Critiques and Meta-Discussion
- Some see the post as PR or leaderboard positioning rather than novel science; others welcome it as a clear, shared definition and a push for better evals.
- A recurring complaint is that much public discourse about hallucinations projects folk-psychology onto systems that are, at core, just very large stochastic language models.