2025-09-23

Getting AI to work in complex codebases

Productivity gains and variability

Commenters dispute the article’s assumption that AI is “at worst” a mild productivity boost; several cite studies and personal experience where AI made them slower or less effective.
Others report 15–20% average gains or much larger speedups on well‑scoped tasks, while acknowledging that some users regress and must either improve their usage or stop.
A key theme: AI amplifies existing skill and discipline. Strong generalists with good technical communication, architecture sense, and testing practices get big wins; weak or rushed users generate slop.

Workflows: specs, planning, and agents

Many endorse a “research → plan → implement” pipeline with explicit compaction of context into markdown specs, CLAUDE.md, or PRDs.
Several describe multi‑agent or multi‑phase flows: one agent researches, one writes a design/plan, others implement and review; some use separate “red‑team” reviewers.
Specs/PRDs become the primary artifact; code is treated more like a compilation target. However, others argue this only works if specs are extremely detailed—at which point you’re effectively programming in English.

Delegation vs abstraction and changing roles

Debate over whether this is “abstraction” (like C over assembly) or “delegation” (like working with a junior engineer). Critics note you must constantly “resteer,” which is unlike using a compiler.
Several say their job is shifting from writing code to defining/verifying behavior and designing test harnesses; others hate “managing the idiot box” and feel this drains the joy from programming.

Tooling, languages, and context management

Go is seen as easier for agents than Python due to static types, stable idioms, and higher‑quality training code. Typed languages and strong linters/pre‑commit hooks help a lot.
Tools like Cursor, Claude Code, Codex, RepoPrompt, and MCP servers are praised for automatic context handling and UI generation, but users still emphasize explicit context control and frequent /reset over opaque /compact.
Some experiment with “strategic forgetting” and AST‑based indexing to keep context windows focused.

Quality, review, and large code changes

Many are alarmed by claims of 20–35k LOC PRs in hours; large AI‑generated PRs are widely considered unreviewable and “hostile.”
There is strong insistence on human review, especially of tests; AI‑written tests are often shallow, slow, or misleading.
Concerns about non‑determinism: unlike compilers, LLMs can produce different implementations from the same spec, so “specs as the real code” is seen as unsafe without exhaustive tests.

Costs, incentives, and culture

Heavy agent usage can cost thousands per month; some see this as worthwhile leverage, others as unjustifiable versus hiring another engineer.
There’s anxiety about managers mandating AI use, measuring LOC, and forcing engineers to claim productivity gains.
Skeptics worry about skill degradation, hidden technical debt in AI‑written codebases, and the lack of public, verifiable success examples at the claimed scale.

Related topics