GPTs and Hallucination

Study methodology & context windows

  • Some argue the paper’s design is “flawed” because prompts were asked sequentially in one chat, so earlier prompts (e.g., “answer in three words”) bled into later ones.
  • Others point out the paper also ran isolated sessions to explicitly test context dependence; using both is seen as the core of the experiment, not a mistake.
  • A subset worries the analysis doesn’t clearly separate those conditions, enabling possible cherry‑picking of results.

What “hallucination” means

  • Strong disagreement over terminology: “hallucination” vs “bullshitting,” “confabulation,” “bad output,” “misprediction.”
  • Critics say “hallucinate” anthropomorphizes systems with no beliefs or awareness and obscures that this is just erroneous output.
  • Supporters say the term is now established, intuitively captures confident fabrication, and is useful for non‑experts.
  • Several suggest “bullshitting” in the philosophical sense: fluent, confident speech without concern for truth.

Why LLMs hallucinate – and why they work at all

  • One camp: LLMs are statistical next‑token generators; hallucinations are the inevitable result of prediction under uncertainty and compressed world knowledge.
  • Another camp says this “just autocomplete” framing is technically true but misleading; internal layers appear to build rich feature/world representations and in‑context learning mechanisms.
  • Broad agreement: accuracy is high where training data is dense and consensus exists (e.g., popular languages, APIs); errors spike with sparse, fast‑changing, or controversial topics.
  • Some argue information‑theoretic and complexity limits mean hallucinations can never be fully eliminated.

Intelligence, world models, and limits

  • Ongoing debate on whether LLMs “have” mental/world models or merely model sources and word co‑occurrences.
  • Some see emergent capabilities (multimodal reasoning, internal features that track real‑world entities) as steps toward genuine world modeling and even future “minds.”
  • Others insist they lack self‑knowledge and epistemology: they don’t know when they don’t know.

Mitigation strategies & tooling

  • Proposed mitigations:
    • RAG with explicit grounding and separate factuality checkers.
    • Symbolic logic / theorem‑proving or semantic validators (e.g., for SQL) to catch structural errors.
    • Better calibration and explicit confidence estimates.
    • Tool use (compilers, interpreters, search) and secondary “fact‑check” passes.
    • Careful sampling/logprob control to trade creativity vs reliability.

Societal and usability concerns

  • Many are less worried about LLM behavior than about user interpretation and vendor marketing that portrays them as reliable, intelligent agents.
  • Concern that people over‑trust confident answers, especially without domain knowledge or visible provenance, leading to misuse and misallocation of resources.