2024-07-18

Overcoming the limits of current LLMs

Training data, licensing, and moats

Many see high‑quality, “tidy”, properly licensed data as the real moat: harder than scaling compute and scraping the web.
Exclusive content deals (e.g., major news outlets) are viewed as anti‑competitive and pushing “technofeudal” dynamics where capital wins regardless of legal stance on scraping.
Without major media, random forum posts become over‑represented, which some find darkly amusing but also concerning.

Nature and terminology of hallucinations

Strong debate over the term “hallucination”: alternatives proposed include “incoherent output”, “confabulation”, and “bullshitting”.
Some argue “hallucination” wrongly implies perceptual errors and human‑like minds; others say it’s already widely understood and language is flexible.
Several commenters stress that LLMs are always generating statistically plausible text, not tracking truth; “some outputs happen to be true” rather than the model caring about correctness.

Can better corpora fix hallucinations?

Skeptics: even a perfect corpus can’t eliminate hallucinations, especially under stochastic sampling (temperature > 0) and for domains like math where generalization, not memorization, is needed.
Optimists: more consistent, higher‑quality data (as in Phi‑style training) can reduce error rates, though building such corpora at scale may be practically impossible.
There is concern that the article underestimates contradictions in science itself and overestimates the existence of a “universally coherent” dataset.

Logic, reasoning, and AGI limits

Many note LLMs still struggle with counting, arithmetic, and formal logic; some see this as evidence they can’t directly scale into AGI.
Others argue LLMs can be components in larger systems with planners, code execution, or search (e.g., MCTS, program synthesis), even if they aren’t planners themselves.
Undecidability, complexity theory, and limits of automated theorem proving are cited as deeper obstacles to “perfect reasoning”.

Techniques to reduce or work around hallucinations

Popular mitigation ideas:
- Using multiple models or multiple samples plus a discriminator/voting to detect and resample bad answers.
- RAG and external tools/APIs, though vector search alone is seen as insufficient, especially for structured data.
- Agentic systems that run code, interact with environments, and get feedback reportedly reduce hallucinations in practice.
- Training models to detect logical fallacies is suggested but viewed as hard, given current failures at basic tasks like counting.

Practical use cases and changing workflows

Several commenters report large productivity gains with current models (GPT‑4o, Claude), particularly for:
- Test‑driven development (letting the LLM generate tests and refactor code).
- Socratic brainstorming and fleshing out half‑formed ideas.
- Acting as a diligent “junior dev” for boilerplate tasks.
Key pattern: stop treating the LLM as a one‑shot oracle; instead use iterative dialogue, self‑tests, and external verification.

Philosophical and societal questions

Some argue hallucination is inherent to intelligence seen as lossy compression, and that humans themselves have incoherent world models.
Others question whether pushing better chatbots actually makes the world better, expressing unease about talent and capital flowing into this area.
There is curiosity but no consensus on whether hallucination is in any sense a “milestone toward consciousness” (largely left as an open, unclear question).

Related topics