Hallucinations in code are the least dangerous form of LLM mistakes
Value and limits of LLM‑generated code
- Several commenters report successfully building non‑trivial systems (DSLs, web servers, lab scripts, SaaS scaffolding) with LLMs, especially when constrained by familiar stacks and libraries.
- Others say LLMs are great for boilerplate, unit tests, demos, or “toy” projects but break down on large, evolving codebases, complex C/C++ APIs, or subtle concurrency and memory issues.
- Some find LLM codebases depressing or uninteresting to study, feeling they remove the “romance” and learning value of human‑written open source.
What counts as a hallucination?
- Disagreement over terminology: some restrict “hallucination” to invented APIs/facts; others see any wrong output (including logic bugs) as hallucination; some argue the term is misleading anthropomorphism.
- Many note that hallucinated methods are often the least dangerous issues; far worse are plausible but wrong logic, mis-specified behavior, or silently ignored edge cases.
- Examples: incorrect ZeroMQ memory handling, wrong lexing line numbers, silent allocation failures, misinterpreted sorting logic, missing features after refactors, or misdescribed behavior in comments.
Code review, testing, and trust
- Strong pushback on “if you have to review it, you’re bad at reviewing code”: reviewers stress that reading unknown code (especially without a human author’s intent) is intrinsically slow and hard.
- Multiple people liken LLM-heavy workflows to “full self‑driving, but keep your hands on the wheel”: over time, humans will stop truly supervising, which is when rare but severe failures matter.
- Consensus that tests can’t prove correctness, only expose some errors; high‑risk code still requires reasoning about requirements, invariants, and race conditions.
- Concern that LLM‑written tests may simply encode the same misunderstandings as the implementation.
Safety, persuasion, and broader risks
- Several argue hallucinations in code are minor compared to risks from persuasive chatbots encouraging self‑harm or violence; cite real incidents and worry about increasingly “people‑pleasing” models.
- Debate over whether future highly persuasive models could “own” users cognitively vs. claims this repeats old moral panics about books, films, and video games.
- Some suggest restricting access or adding “safety buffers” between powerful models and end users; others see this as censorship and corporate moat‑building.
Maintainability, architecture, and ecosystem effects
- Common complaint: LLMs produce inconsistent patterns, over‑engineering, weird abstractions, repeated CSS/styles, and poor error handling—harder to maintain than hand‑written code.
- Worry that devs will choose “boring” or popular tech purely because models know it, reducing innovation and pushing ecosystems toward what’s well‑represented in training data.
- Security concerns include prompt‑driven supply‑chain attacks via hallucinated packages and the ease of mass‑producing superficially good but subtly vulnerable code.