Yet Another LLM Rant
Model knowledge, synthetic data, and “model collapse”
- Some speculate newer GPT versions feel “better at coding” mainly because they’re trained on newer docs/blog posts, not because of deeper reasoning.
- Several comments worry about future models training on their own outputs (synthetic data), calling this unsustainable and linking it to “model collapse” or a “software Habsburg jaw.”
- Others note synthetic data can be used to reinforce patterns or coverage, but can’t magically create genuinely new knowledge.
Hallucinations, truth, and what it means to “know”
- One camp says hallucinations are inherent: LLMs always generate something, have no concept of reality, and you can’t define a clean probability cutoff without a ground-truth model of the world.
- Another camp argues that for narrow domains with labeled data (e.g. Iris dataset, math benchmarks) probability cutoffs and calibration are feasible; general world-knowledge is the hard part.
- Long subthreads debate whether humans are “statistical models,” whether LLMs can be said to “know/think/deduce,” and whether humans are actually much better at knowing what they don’t know.
- Some bring in philosophy (Kant, Dennett, qualia) and neuroscience; others stress we don’t yet have agreed theoretical criteria for AGI.
- There’s mention of newer reasoning models that more often answer “I’m not sure” in math, as a partial mitigation of hallucinations.
LLMs as tools: usefulness vs unreliability
- Many argue LLMs are valuable but must be treated like powerful, error-prone tools—more like overconfident interns than compilers. You must always verify.
- Others counter that good tools are reliable and transparent; LLMs feel more like capricious bureaucracies and so are poor tools, especially when marketed as near-oracles.
- Some liken the trust problem to Wikipedia or journals—imperfect but still highly useful if you understand their limits; others insist LLMs’ opaque failures and overconfidence are a qualitatively worse issue.
- A recurring point: checking LLM output can be easier than producing it from scratch for boilerplate or tedious tasks, but becomes dangerous with subtle or high‑stakes work.
Coding workflows, agents, and the zstd/iOS case
- The original rant centers on GPT fabricating an iOS zstd API; commenters confirm both failure and success cases:
- Some runs of GPT‑5 (especially with “thinking” mode or web search) correctly say “you can’t; iOS has no zstd, use LZFSE/LZ4 or vendor libzstd.”
- Other runs confidently hallucinate a built‑in zstd option, illustrating non‑determinism and routing issues.
- Several advocate “reality checks”: coding agents or IDE integrations that compile/run code, run tests, or query official docs (via tools/MCP), catching hallucinations automatically.
- Others report agents can loop uselessly when stuck on a false assumption, burning tokens, so human oversight is still essential.
- Some share experiences where LLMs handle boilerplate and standard patterns very well, but fail badly on niche algorithms (e.g., Byzantine Generals) or poorly documented edge cases.
Evaluation practices and expectations
- Multiple comments criticize judging GPT‑5 from a single prompt, likening it to discarding a language or type system after one failure.
- Others defend that if a tool can confidently send you down a dead end on a straightforward factual constraint (“this API doesn’t exist”), it’s disqualifying for their personal workflow.
- There’s a meta‑discussion about prompt style (short vs detailed, use of reasoning/search) and “holding it wrong” accusations.
Social and professional impacts
- Some worry hype is devaluing software engineering, encouraging management to over‑rely on LLMs, and eroding pathways for junior developers who may become dependent on expensive tools.
- Concerns extend to artists and other professions whose work is used for training, and to a future where AI’s main clear winners are large corporations cutting labor costs.
- One commenter notes many treat AGI as “Artificial God Intelligence,” criticizing unrealistic expectations and marketing.
Meta: anthropomorphizing and rhetoric
- Several point out contradictions in calling LLMs “chronic liars” while insisting we shouldn’t anthropomorphize them.
- The “stochastic parrot” line is seen by some as insightful, by others as an outdated meme that ignores recent empirical progress in reasoning and internal structure.
- Overall, the thread splits between skeptics emphasizing unreliability and epistemic limits, and practitioners emphasizing pragmatic gains when LLMs are used cautiously within robust feedback loops.