Detecting hallucinations in large language models using semantic entropy

What LLM “hallucinations” are

  • Many argue LLMs are “orthogonal to truth”: they optimize for plausible text, not correctness.
  • Several posters prefer terms like “bullshit” (truth-indifferent output) or “confabulation” (false but fluent narratives) over “hallucination” (which suggests misperception by a mind).
  • Others think “hallucination” is now a useful term of art and language drift is fine; critics counter that sloppy metaphors will mislead policymakers and the public.

Intentionality and anthropomorphism

  • Long subthread on whether algorithms can “intend,” “lie,” or “care about” truth.
  • One side: intentions require minds; computers are just deterministic / formal systems, so attributing mental states is a category error.
  • Other side: we lack a settled theory of intentionality; under many metaphysical views (physicalism, panpsychism, some idealisms), it’s at least possible AIs could have genuine or pseudo-intentionality, so strong denials are premature.
  • Several note that for everyday use, simulated intentionality (acting as if intentional) is often enough.

Semantic entropy method

  • Core idea: sample multiple answers, cluster them by semantic equivalence (via another model), then compute an entropy over clusters.
  • High semantic entropy ≈ model gives many divergent meanings ⇒ likely “confabulation.”
  • Low entropy ≈ model consistently produces similar meanings ⇒ more grounded in training data.
  • A variant decomposes answers into factoids, reformulates each as a question, and re-checks each factoid with the entropy method.

Critiques and limitations

  • High agreement does not guarantee truth; a model can be confidently wrong (e.g., outdated training data, popular misconceptions).
  • Entropy measures dispersion of the output distribution, not correctness of that distribution; knowing “how certain” the model is is different from knowing it’s right.
  • May mislabel creative or multi-valid-answer tasks as hallucinations.
  • Some see this as just another heuristic layered on top of a fundamentally non-truth-seeking system; others find it a useful partial safety tool, especially when full retraining is impossible.

Use cases and broader attitudes

  • Proposed for high-stakes settings (e.g., public agencies) to suppress low-confidence answers and escalate to humans.
  • Some argue we should frame LLMs as “hint providers” / improv text generators, not near-AGI or truth oracles.
  • Ongoing tension between recognizing real utility (coding help, explanation, synthesis) and concern over overhype, misuse, and misplaced trust.