Hallucinations in code are the least dangerous form of LLM mistakes

Value and limits of LLM‑generated code

  • Several commenters report successfully building non‑trivial systems (DSLs, web servers, lab scripts, SaaS scaffolding) with LLMs, especially when constrained by familiar stacks and libraries.
  • Others say LLMs are great for boilerplate, unit tests, demos, or “toy” projects but break down on large, evolving codebases, complex C/C++ APIs, or subtle concurrency and memory issues.
  • Some find LLM codebases depressing or uninteresting to study, feeling they remove the “romance” and learning value of human‑written open source.

What counts as a hallucination?

  • Disagreement over terminology: some restrict “hallucination” to invented APIs/facts; others see any wrong output (including logic bugs) as hallucination; some argue the term is misleading anthropomorphism.
  • Many note that hallucinated methods are often the least dangerous issues; far worse are plausible but wrong logic, mis-specified behavior, or silently ignored edge cases.
  • Examples: incorrect ZeroMQ memory handling, wrong lexing line numbers, silent allocation failures, misinterpreted sorting logic, missing features after refactors, or misdescribed behavior in comments.

Code review, testing, and trust

  • Strong pushback on “if you have to review it, you’re bad at reviewing code”: reviewers stress that reading unknown code (especially without a human author’s intent) is intrinsically slow and hard.
  • Multiple people liken LLM-heavy workflows to “full self‑driving, but keep your hands on the wheel”: over time, humans will stop truly supervising, which is when rare but severe failures matter.
  • Consensus that tests can’t prove correctness, only expose some errors; high‑risk code still requires reasoning about requirements, invariants, and race conditions.
  • Concern that LLM‑written tests may simply encode the same misunderstandings as the implementation.

Safety, persuasion, and broader risks

  • Several argue hallucinations in code are minor compared to risks from persuasive chatbots encouraging self‑harm or violence; cite real incidents and worry about increasingly “people‑pleasing” models.
  • Debate over whether future highly persuasive models could “own” users cognitively vs. claims this repeats old moral panics about books, films, and video games.
  • Some suggest restricting access or adding “safety buffers” between powerful models and end users; others see this as censorship and corporate moat‑building.

Maintainability, architecture, and ecosystem effects

  • Common complaint: LLMs produce inconsistent patterns, over‑engineering, weird abstractions, repeated CSS/styles, and poor error handling—harder to maintain than hand‑written code.
  • Worry that devs will choose “boring” or popular tech purely because models know it, reducing innovation and pushing ecosystems toward what’s well‑represented in training data.
  • Security concerns include prompt‑driven supply‑chain attacks via hallucinated packages and the ease of mass‑producing superficially good but subtly vulnerable code.