We gotta stop ignoring AI's hallucination problem

Nature of “hallucinations”

  • Many argue “hallucination” is a bad term: LLMs lack perception; errors are closer to confabulation, delusion, or “bullshitting.”
  • Others note that, functionally, LLMs are always generating plausible continuations; correctness is incidental and judged by humans after the fact.
  • Some say hallucination is intrinsic to neural nets / generative models: sampling from a distribution over tokens inevitably produces confident nonsense at times.

LLMs vs intelligence and knowledge

  • One side: LLMs are just large text (or multimodal) models, not knowledge systems; they lack an internal notion of “I don’t know” and true understanding.
  • Another side: current models already exhibit “basic intelligence” (following new rules, inventing/playing games, coding in fictional languages) and can solve novel problems when instructed.
  • Strong skeptics stress inconsistency, shallow reasoning, and failure on simple tasks (chess, tic‑tac‑toe, counting, lists) as evidence against real understanding.

Prompting, reliability, and consistency

  • “You’re holding it wrong” camp: most failures come from vague or underspecified prompts; good prompting (role, context, examples) greatly improves results.
  • Critics respond that needing elaborate prompts undermines claims of intelligence and still doesn’t yield consistent, trustworthy behavior.
  • Inconsistency across identical queries is repeatedly cited as a key mark of non‑intelligence.

Human analogies and responsibility

  • Comparisons to human bias, false memories, and confabulation are common, but many note humans can say “I don’t know” and often avoid making things up in high‑stakes contexts.
  • Concern: LLMs fail differently from humans—confidently fabricating specifics (laws, APIs, court cases, features) without signaling uncertainty.

Use cases where errors are tolerable or useful

  • Many find LLMs valuable for low‑stakes, non-factual work: art, creative writing, translation refinement, brainstorming, summarization, and coding assistance.
  • Some explicitly use “hallucinations” as a creativity feature.

Productization, marketing, and backlash

  • Several worry that vendors, especially in education and consumer tools, oversell AI as accurate or “hallucination‑free.”
  • Others say hallucination risk is widely known in technical circles but downplayed in public demos (e.g., shiny keynotes).
  • There’s concern about embedding LLMs in critical systems (tax advice, regulation, healthcare) where wrong but confident answers are unacceptable.

Mitigation and future directions

  • A lot of effort is reportedly going into grounding, retrieval, hybrid knowledge systems, and better evaluation.
  • Some believe LLMs are just one step toward broader AI; others think scaling them alone will never fix the hallucination problem.