Detecting hallucinations in large language models using semantic entropy
What LLM “hallucinations” are
- Many argue LLMs are “orthogonal to truth”: they optimize for plausible text, not correctness.
- Several posters prefer terms like “bullshit” (truth-indifferent output) or “confabulation” (false but fluent narratives) over “hallucination” (which suggests misperception by a mind).
- Others think “hallucination” is now a useful term of art and language drift is fine; critics counter that sloppy metaphors will mislead policymakers and the public.
Intentionality and anthropomorphism
- Long subthread on whether algorithms can “intend,” “lie,” or “care about” truth.
- One side: intentions require minds; computers are just deterministic / formal systems, so attributing mental states is a category error.
- Other side: we lack a settled theory of intentionality; under many metaphysical views (physicalism, panpsychism, some idealisms), it’s at least possible AIs could have genuine or pseudo-intentionality, so strong denials are premature.
- Several note that for everyday use, simulated intentionality (acting as if intentional) is often enough.
Semantic entropy method
- Core idea: sample multiple answers, cluster them by semantic equivalence (via another model), then compute an entropy over clusters.
- High semantic entropy ≈ model gives many divergent meanings ⇒ likely “confabulation.”
- Low entropy ≈ model consistently produces similar meanings ⇒ more grounded in training data.
- A variant decomposes answers into factoids, reformulates each as a question, and re-checks each factoid with the entropy method.
Critiques and limitations
- High agreement does not guarantee truth; a model can be confidently wrong (e.g., outdated training data, popular misconceptions).
- Entropy measures dispersion of the output distribution, not correctness of that distribution; knowing “how certain” the model is is different from knowing it’s right.
- May mislabel creative or multi-valid-answer tasks as hallucinations.
- Some see this as just another heuristic layered on top of a fundamentally non-truth-seeking system; others find it a useful partial safety tool, especially when full retraining is impossible.
Use cases and broader attitudes
- Proposed for high-stakes settings (e.g., public agencies) to suppress low-confidence answers and escalate to humans.
- Some argue we should frame LLMs as “hint providers” / improv text generators, not near-AGI or truth oracles.
- Ongoing tension between recognizing real utility (coding help, explanation, synthesis) and concern over overhype, misuse, and misplaced trust.