LLMs Will Always Hallucinate, and We Need to Live with This
What “hallucination” means
- Many argue “hallucination” is a misleading term; LLMs are doing normal probabilistic text generation, not suffering a discrete malfunction.
- Several say all outputs are essentially hallucinations: probabilistic strings with no built‑in notion of truth; some just happen to match reality.
- Others prefer terms like “confabulation,” “bullshit,” or simply “inaccuracy,” emphasizing that correctness is a judgment by readers, not the model.
- One line of argument: “hallucinations” and “alignment” are the same technical problem—constraining outputs to what some authority deems acceptable (truth, safety, morality, etc.).
Inevitability vs mitigation
- Some accept the paper’s point that zero hallucinations is impossible in principle, but note this says little about how small the error rate can become in practice.
- Comparisons: quantum tunneling (nonzero but negligible), or the halting problem (theoretical limit vs engineering usefulness).
- Others see current LLM architectures as fundamentally hallucination‑prone and think this will cap their practical scope.
- A minority says hallucination is a feature for creativity, fiction, and idea generation; a perfectly “truthful” model would be closer to copy‑paste and less useful creatively.
LLMs vs human cognition
- One camp emphasizes differences: humans can often say “I don’t know,” calibrate confidence, and learn from mistakes; LLMs tend to answer confidently regardless.
- Another camp stresses similarities: humans also misremember, confabulate, believe nonsense, and “complete the next word” when speaking; some are worse than today’s LLMs.
- Debate over whether human “intelligence” is qualitatively different or mainly a matter of scale, architecture, and evolutionary pre‑training.
Appropriate use cases
- Consensus that LLMs are useful where:
- Outputs are low‑stakes (summaries, boilerplate, creative text, brainstorming).
- Humans can efficiently verify or correct candidate answers.
- Strong skepticism for high‑stakes domains (law, medicine, critical research, automation with no human in the loop), because even rare hallucinations can be catastrophic.
- Some argue true automation requires superhuman reliability, not “human‑level fallibility,” so LLMs are a poor fit as general human replacements.
Mitigation and product design
- Proposed mitigations include: using token probabilities to estimate confidence, multiple generations and consistency checks, post‑training to reduce overconfident wrong answers, and external retrieval/sanity‑checking.
- Disagreement whether hallucinations are:
- A “bug” to be fixed inside the model,
- A deeper design limitation of next‑token prediction, or
- An inevitable property that must be managed in the surrounding product (e.g., verification layers, constrained domains).
Hype, business, and ethics
- Many criticize marketing that presents LLMs as oracles or universal automation, especially to users habituated to trusting top search results.
- Some see “hallucinations” being downplayed to keep the AGI/AI‑bubble narrative going and justify further investment.
- Others argue that even fallible tools are worthwhile, but only if users maintain a realistic mental model of their limitations.