GPTs and Hallucination
Study methodology & context windows
- Some argue the paper’s design is “flawed” because prompts were asked sequentially in one chat, so earlier prompts (e.g., “answer in three words”) bled into later ones.
- Others point out the paper also ran isolated sessions to explicitly test context dependence; using both is seen as the core of the experiment, not a mistake.
- A subset worries the analysis doesn’t clearly separate those conditions, enabling possible cherry‑picking of results.
What “hallucination” means
- Strong disagreement over terminology: “hallucination” vs “bullshitting,” “confabulation,” “bad output,” “misprediction.”
- Critics say “hallucinate” anthropomorphizes systems with no beliefs or awareness and obscures that this is just erroneous output.
- Supporters say the term is now established, intuitively captures confident fabrication, and is useful for non‑experts.
- Several suggest “bullshitting” in the philosophical sense: fluent, confident speech without concern for truth.
Why LLMs hallucinate – and why they work at all
- One camp: LLMs are statistical next‑token generators; hallucinations are the inevitable result of prediction under uncertainty and compressed world knowledge.
- Another camp says this “just autocomplete” framing is technically true but misleading; internal layers appear to build rich feature/world representations and in‑context learning mechanisms.
- Broad agreement: accuracy is high where training data is dense and consensus exists (e.g., popular languages, APIs); errors spike with sparse, fast‑changing, or controversial topics.
- Some argue information‑theoretic and complexity limits mean hallucinations can never be fully eliminated.
Intelligence, world models, and limits
- Ongoing debate on whether LLMs “have” mental/world models or merely model sources and word co‑occurrences.
- Some see emergent capabilities (multimodal reasoning, internal features that track real‑world entities) as steps toward genuine world modeling and even future “minds.”
- Others insist they lack self‑knowledge and epistemology: they don’t know when they don’t know.
Mitigation strategies & tooling
- Proposed mitigations:
- RAG with explicit grounding and separate factuality checkers.
- Symbolic logic / theorem‑proving or semantic validators (e.g., for SQL) to catch structural errors.
- Better calibration and explicit confidence estimates.
- Tool use (compilers, interpreters, search) and secondary “fact‑check” passes.
- Careful sampling/logprob control to trade creativity vs reliability.
Societal and usability concerns
- Many are less worried about LLM behavior than about user interpretation and vendor marketing that portrays them as reliable, intelligent agents.
- Concern that people over‑trust confident answers, especially without domain knowledge or visible provenance, leading to misuse and misallocation of resources.