Why language models hallucinate

Evaluation, Multiple-Choice Analogies, and Incentives

  • Many comments pick up on the article’s multiple-choice test analogy: current benchmarks reward “getting it right” but don’t penalize confident wrong answers, so models are implicitly trained to guess rather than say “I don’t know.”
  • Some compare this to standardized tests with negative marking or partial credit for blank answers, arguing evals should similarly penalize confident errors and allow abstention.
  • Others note this is hard to implement technically at scale: answers aren’t one token, synonyms and formatting complicate what counts as “wrong,” and transformer training doesn’t trivially support “negative points” for incorrect generations.

What Counts as a Hallucination?

  • One camp insists “all an LLM does is hallucinate”: everything is probabilistic next-token generation, and some outputs just happen to be true or useful.
  • Another camp adopts the article’s narrower definition: hallucinations are plausible but false statements; not all generations qualify. Under this view, the term is only useful if it distinguishes wrong factual assertions from correct ones.
  • There’s pushback that “hallucination” is anthropomorphic marketing; alternatives like “confabulation” or simply “prediction error” are suggested.

Root Causes and Architectural Limits

  • Several comments reiterate the paper’s argument: next-word prediction on noisy, incomplete data inevitably leads to errors, especially for low-frequency or effectively random facts (like birthdays).
  • Others argue the deeper problem is lack of grounding and metacognition: models don’t truly know what they know, can’t access their own “knowledge boundaries,” and separate training from inference, unlike humans who continuously learn and track uncertainty.
  • Some see hallucinations as an inherent byproduct of large lossy models compressing the world; with finite capacity and imperfect data, there will always be gaps.

Can Hallucinations Be Reduced or Avoided?

  • Many are positive about training models to express uncertainty or abstain (“I don’t know/I’m unsure”), but question how well uncertainty can be calibrated in practice.
  • There’s broad agreement that you can build non‑hallucinating narrow systems (e.g., fixed QA databases + calculators) that say IDK outside their domain; disagreement is whether general LLMs can approach that behavior.
  • Multiple commenters note a precision–recall tradeoff: fewer hallucinations means more refusals and less user appeal; current business incentives and leaderboards push vendors toward “always answer,” encouraging hallucinations.

Broader Critiques and Meta-Discussion

  • Some see the post as PR or leaderboard positioning rather than novel science; others welcome it as a clear, shared definition and a push for better evals.
  • A recurring complaint is that much public discourse about hallucinations projects folk-psychology onto systems that are, at core, just very large stochastic language models.