Yet Another LLM Rant

Model knowledge, synthetic data, and “model collapse”

  • Some speculate newer GPT versions feel “better at coding” mainly because they’re trained on newer docs/blog posts, not because of deeper reasoning.
  • Several comments worry about future models training on their own outputs (synthetic data), calling this unsustainable and linking it to “model collapse” or a “software Habsburg jaw.”
  • Others note synthetic data can be used to reinforce patterns or coverage, but can’t magically create genuinely new knowledge.

Hallucinations, truth, and what it means to “know”

  • One camp says hallucinations are inherent: LLMs always generate something, have no concept of reality, and you can’t define a clean probability cutoff without a ground-truth model of the world.
  • Another camp argues that for narrow domains with labeled data (e.g. Iris dataset, math benchmarks) probability cutoffs and calibration are feasible; general world-knowledge is the hard part.
  • Long subthreads debate whether humans are “statistical models,” whether LLMs can be said to “know/think/deduce,” and whether humans are actually much better at knowing what they don’t know.
  • Some bring in philosophy (Kant, Dennett, qualia) and neuroscience; others stress we don’t yet have agreed theoretical criteria for AGI.
  • There’s mention of newer reasoning models that more often answer “I’m not sure” in math, as a partial mitigation of hallucinations.

LLMs as tools: usefulness vs unreliability

  • Many argue LLMs are valuable but must be treated like powerful, error-prone tools—more like overconfident interns than compilers. You must always verify.
  • Others counter that good tools are reliable and transparent; LLMs feel more like capricious bureaucracies and so are poor tools, especially when marketed as near-oracles.
  • Some liken the trust problem to Wikipedia or journals—imperfect but still highly useful if you understand their limits; others insist LLMs’ opaque failures and overconfidence are a qualitatively worse issue.
  • A recurring point: checking LLM output can be easier than producing it from scratch for boilerplate or tedious tasks, but becomes dangerous with subtle or high‑stakes work.

Coding workflows, agents, and the zstd/iOS case

  • The original rant centers on GPT fabricating an iOS zstd API; commenters confirm both failure and success cases:
    • Some runs of GPT‑5 (especially with “thinking” mode or web search) correctly say “you can’t; iOS has no zstd, use LZFSE/LZ4 or vendor libzstd.”
    • Other runs confidently hallucinate a built‑in zstd option, illustrating non‑determinism and routing issues.
  • Several advocate “reality checks”: coding agents or IDE integrations that compile/run code, run tests, or query official docs (via tools/MCP), catching hallucinations automatically.
  • Others report agents can loop uselessly when stuck on a false assumption, burning tokens, so human oversight is still essential.
  • Some share experiences where LLMs handle boilerplate and standard patterns very well, but fail badly on niche algorithms (e.g., Byzantine Generals) or poorly documented edge cases.

Evaluation practices and expectations

  • Multiple comments criticize judging GPT‑5 from a single prompt, likening it to discarding a language or type system after one failure.
  • Others defend that if a tool can confidently send you down a dead end on a straightforward factual constraint (“this API doesn’t exist”), it’s disqualifying for their personal workflow.
  • There’s a meta‑discussion about prompt style (short vs detailed, use of reasoning/search) and “holding it wrong” accusations.

Social and professional impacts

  • Some worry hype is devaluing software engineering, encouraging management to over‑rely on LLMs, and eroding pathways for junior developers who may become dependent on expensive tools.
  • Concerns extend to artists and other professions whose work is used for training, and to a future where AI’s main clear winners are large corporations cutting labor costs.
  • One commenter notes many treat AGI as “Artificial God Intelligence,” criticizing unrealistic expectations and marketing.

Meta: anthropomorphizing and rhetoric

  • Several point out contradictions in calling LLMs “chronic liars” while insisting we shouldn’t anthropomorphize them.
  • The “stochastic parrot” line is seen by some as insightful, by others as an outdated meme that ignores recent empirical progress in reasoning and internal structure.
  • Overall, the thread splits between skeptics emphasizing unreliability and epistemic limits, and practitioners emphasizing pragmatic gains when LLMs are used cautiously within robust feedback loops.