2025-08-09

Yet Another LLM Rant

Model knowledge, synthetic data, and “model collapse”

Some speculate newer GPT versions feel “better at coding” mainly because they’re trained on newer docs/blog posts, not because of deeper reasoning.
Several comments worry about future models training on their own outputs (synthetic data), calling this unsustainable and linking it to “model collapse” or a “software Habsburg jaw.”
Others note synthetic data can be used to reinforce patterns or coverage, but can’t magically create genuinely new knowledge.

Hallucinations, truth, and what it means to “know”

One camp says hallucinations are inherent: LLMs always generate something, have no concept of reality, and you can’t define a clean probability cutoff without a ground-truth model of the world.
Another camp argues that for narrow domains with labeled data (e.g. Iris dataset, math benchmarks) probability cutoffs and calibration are feasible; general world-knowledge is the hard part.
Long subthreads debate whether humans are “statistical models,” whether LLMs can be said to “know/think/deduce,” and whether humans are actually much better at knowing what they don’t know.
Some bring in philosophy (Kant, Dennett, qualia) and neuroscience; others stress we don’t yet have agreed theoretical criteria for AGI.
There’s mention of newer reasoning models that more often answer “I’m not sure” in math, as a partial mitigation of hallucinations.

LLMs as tools: usefulness vs unreliability

Many argue LLMs are valuable but must be treated like powerful, error-prone tools—more like overconfident interns than compilers. You must always verify.
Others counter that good tools are reliable and transparent; LLMs feel more like capricious bureaucracies and so are poor tools, especially when marketed as near-oracles.
Some liken the trust problem to Wikipedia or journals—imperfect but still highly useful if you understand their limits; others insist LLMs’ opaque failures and overconfidence are a qualitatively worse issue.
A recurring point: checking LLM output can be easier than producing it from scratch for boilerplate or tedious tasks, but becomes dangerous with subtle or high‑stakes work.

Coding workflows, agents, and the zstd/iOS case

The original rant centers on GPT fabricating an iOS zstd API; commenters confirm both failure and success cases:
- Some runs of GPT‑5 (especially with “thinking” mode or web search) correctly say “you can’t; iOS has no zstd, use LZFSE/LZ4 or vendor libzstd.”
- Other runs confidently hallucinate a built‑in zstd option, illustrating non‑determinism and routing issues.
Several advocate “reality checks”: coding agents or IDE integrations that compile/run code, run tests, or query official docs (via tools/MCP), catching hallucinations automatically.
Others report agents can loop uselessly when stuck on a false assumption, burning tokens, so human oversight is still essential.
Some share experiences where LLMs handle boilerplate and standard patterns very well, but fail badly on niche algorithms (e.g., Byzantine Generals) or poorly documented edge cases.

Evaluation practices and expectations

Multiple comments criticize judging GPT‑5 from a single prompt, likening it to discarding a language or type system after one failure.
Others defend that if a tool can confidently send you down a dead end on a straightforward factual constraint (“this API doesn’t exist”), it’s disqualifying for their personal workflow.
There’s a meta‑discussion about prompt style (short vs detailed, use of reasoning/search) and “holding it wrong” accusations.

Social and professional impacts

Some worry hype is devaluing software engineering, encouraging management to over‑rely on LLMs, and eroding pathways for junior developers who may become dependent on expensive tools.
Concerns extend to artists and other professions whose work is used for training, and to a future where AI’s main clear winners are large corporations cutting labor costs.
One commenter notes many treat AGI as “Artificial God Intelligence,” criticizing unrealistic expectations and marketing.

Meta: anthropomorphizing and rhetoric

Several point out contradictions in calling LLMs “chronic liars” while insisting we shouldn’t anthropomorphize them.
The “stochastic parrot” line is seen by some as insightful, by others as an outdated meme that ignores recent empirical progress in reasoning and internal structure.
Overall, the thread splits between skeptics emphasizing unreliability and epistemic limits, and practitioners emphasizing pragmatic gains when LLMs are used cautiously within robust feedback loops.

Related topics