Comprehension debt: A ticking time bomb of LLM-generated code

Scope of “Comprehension Debt”

  • Many see this as an old problem (legacy systems, offshore code, intern code) that LLMs greatly amplify rather than create anew.
  • Others argue LLM code is qualitatively different: there may be no human mental model behind it at all, only a plausible-looking surface.

Human vs LLM Code and Institutional Knowledge

  • Human-written code often comes with institutional memory, design docs, tickets, and the possibility of asking “why?”—even if imperfectly.
  • LLMs can explain what code does, but commenters doubt they can reliably explain why it’s structured that way or which trade‑offs were intended.
  • Several connect this to “programming as theory building”: LLMs remove even the incidental theory-building you get from manually typing the code.

Tests, Specs, and Design as Counterweights

  • Many propose spec‑driven or test‑driven workflows: have LLMs generate code plus tests, enforce style/architecture rules, and treat specs as the real artifact.
  • Critics note LLM tests often mirror the same misunderstanding as the code, so both must still be reviewed; tests can become vacuous or wrong.
  • Strong modularization, explicit interfaces, and richer documentation (possibly LLM‑assisted) are seen as key to containing comprehension debt.

Workflow, Quality, and Management Incentives

  • Concern that management treats AI as a pure speed multiplier, pressuring reviewers to rubber‑stamp growing volumes of opaque code.
  • Fear that this accelerates existing “barely functional” quality norms and drives out engineers who care about design and polish.
  • Some liken LLM coding to earlier waves of sloppy abstraction (EJBs, ORMs, JS frameworks), but at far higher volume and speed.

Where LLMs Work Well (Today)

  • Refactoring under strong test coverage; bulk mechanical changes (API shifts, renames).
  • One‑off utilities, data munging scripts, sample code, and boilerplate.
  • Helping understand unfamiliar or legacy codebases by answering localized “what does this do?” questions—though hallucinated explanations are a risk.

Future Trajectories and Disagreement

  • Optimists expect future models to handle both comprehension and maintenance of LLM‑generated spaghetti, making today’s debt moot.
  • Skeptics doubt core issues (hallucinations, lack of genuine understanding, ambiguous natural‑language “specs”) will vanish quickly, and worry about long‑term skill atrophy and write‑only codebases.