2025-09-06

Why language models hallucinate

Evaluation, Multiple-Choice Analogies, and Incentives

Many comments pick up on the article’s multiple-choice test analogy: current benchmarks reward “getting it right” but don’t penalize confident wrong answers, so models are implicitly trained to guess rather than say “I don’t know.”
Some compare this to standardized tests with negative marking or partial credit for blank answers, arguing evals should similarly penalize confident errors and allow abstention.
Others note this is hard to implement technically at scale: answers aren’t one token, synonyms and formatting complicate what counts as “wrong,” and transformer training doesn’t trivially support “negative points” for incorrect generations.

What Counts as a Hallucination?

One camp insists “all an LLM does is hallucinate”: everything is probabilistic next-token generation, and some outputs just happen to be true or useful.
Another camp adopts the article’s narrower definition: hallucinations are plausible but false statements; not all generations qualify. Under this view, the term is only useful if it distinguishes wrong factual assertions from correct ones.
There’s pushback that “hallucination” is anthropomorphic marketing; alternatives like “confabulation” or simply “prediction error” are suggested.

Root Causes and Architectural Limits

Several comments reiterate the paper’s argument: next-word prediction on noisy, incomplete data inevitably leads to errors, especially for low-frequency or effectively random facts (like birthdays).
Others argue the deeper problem is lack of grounding and metacognition: models don’t truly know what they know, can’t access their own “knowledge boundaries,” and separate training from inference, unlike humans who continuously learn and track uncertainty.
Some see hallucinations as an inherent byproduct of large lossy models compressing the world; with finite capacity and imperfect data, there will always be gaps.

Can Hallucinations Be Reduced or Avoided?

Many are positive about training models to express uncertainty or abstain (“I don’t know/I’m unsure”), but question how well uncertainty can be calibrated in practice.
There’s broad agreement that you can build non‑hallucinating narrow systems (e.g., fixed QA databases + calculators) that say IDK outside their domain; disagreement is whether general LLMs can approach that behavior.
Multiple commenters note a precision–recall tradeoff: fewer hallucinations means more refusals and less user appeal; current business incentives and leaderboards push vendors toward “always answer,” encouraging hallucinations.

Broader Critiques and Meta-Discussion

Some see the post as PR or leaderboard positioning rather than novel science; others welcome it as a clear, shared definition and a push for better evals.
A recurring complaint is that much public discourse about hallucinations projects folk-psychology onto systems that are, at core, just very large stochastic language models.

Related topics