2024-05-15

We gotta stop ignoring AI's hallucination problem

Nature of “hallucinations”

Many argue “hallucination” is a bad term: LLMs lack perception; errors are closer to confabulation, delusion, or “bullshitting.”
Others note that, functionally, LLMs are always generating plausible continuations; correctness is incidental and judged by humans after the fact.
Some say hallucination is intrinsic to neural nets / generative models: sampling from a distribution over tokens inevitably produces confident nonsense at times.

LLMs vs intelligence and knowledge

One side: LLMs are just large text (or multimodal) models, not knowledge systems; they lack an internal notion of “I don’t know” and true understanding.
Another side: current models already exhibit “basic intelligence” (following new rules, inventing/playing games, coding in fictional languages) and can solve novel problems when instructed.
Strong skeptics stress inconsistency, shallow reasoning, and failure on simple tasks (chess, tic‑tac‑toe, counting, lists) as evidence against real understanding.

Prompting, reliability, and consistency

“You’re holding it wrong” camp: most failures come from vague or underspecified prompts; good prompting (role, context, examples) greatly improves results.
Critics respond that needing elaborate prompts undermines claims of intelligence and still doesn’t yield consistent, trustworthy behavior.
Inconsistency across identical queries is repeatedly cited as a key mark of non‑intelligence.

Human analogies and responsibility

Comparisons to human bias, false memories, and confabulation are common, but many note humans can say “I don’t know” and often avoid making things up in high‑stakes contexts.
Concern: LLMs fail differently from humans—confidently fabricating specifics (laws, APIs, court cases, features) without signaling uncertainty.

Use cases where errors are tolerable or useful

Many find LLMs valuable for low‑stakes, non-factual work: art, creative writing, translation refinement, brainstorming, summarization, and coding assistance.
Some explicitly use “hallucinations” as a creativity feature.

Productization, marketing, and backlash

Several worry that vendors, especially in education and consumer tools, oversell AI as accurate or “hallucination‑free.”
Others say hallucination risk is widely known in technical circles but downplayed in public demos (e.g., shiny keynotes).
There’s concern about embedding LLMs in critical systems (tax advice, regulation, healthcare) where wrong but confident answers are unacceptable.

Mitigation and future directions

A lot of effort is reportedly going into grounding, retrieval, hybrid knowledge systems, and better evaluation.
Some believe LLMs are just one step toward broader AI; others think scaling them alone will never fix the hallucination problem.

Related topics