Getting AI to work in complex codebases
Productivity gains and variability
- Commenters dispute the article’s assumption that AI is “at worst” a mild productivity boost; several cite studies and personal experience where AI made them slower or less effective.
- Others report 15–20% average gains or much larger speedups on well‑scoped tasks, while acknowledging that some users regress and must either improve their usage or stop.
- A key theme: AI amplifies existing skill and discipline. Strong generalists with good technical communication, architecture sense, and testing practices get big wins; weak or rushed users generate slop.
Workflows: specs, planning, and agents
- Many endorse a “research → plan → implement” pipeline with explicit compaction of context into markdown specs, CLAUDE.md, or PRDs.
- Several describe multi‑agent or multi‑phase flows: one agent researches, one writes a design/plan, others implement and review; some use separate “red‑team” reviewers.
- Specs/PRDs become the primary artifact; code is treated more like a compilation target. However, others argue this only works if specs are extremely detailed—at which point you’re effectively programming in English.
Delegation vs abstraction and changing roles
- Debate over whether this is “abstraction” (like C over assembly) or “delegation” (like working with a junior engineer). Critics note you must constantly “resteer,” which is unlike using a compiler.
- Several say their job is shifting from writing code to defining/verifying behavior and designing test harnesses; others hate “managing the idiot box” and feel this drains the joy from programming.
Tooling, languages, and context management
- Go is seen as easier for agents than Python due to static types, stable idioms, and higher‑quality training code. Typed languages and strong linters/pre‑commit hooks help a lot.
- Tools like Cursor, Claude Code, Codex, RepoPrompt, and MCP servers are praised for automatic context handling and UI generation, but users still emphasize explicit context control and frequent
/resetover opaque/compact. - Some experiment with “strategic forgetting” and AST‑based indexing to keep context windows focused.
Quality, review, and large code changes
- Many are alarmed by claims of 20–35k LOC PRs in hours; large AI‑generated PRs are widely considered unreviewable and “hostile.”
- There is strong insistence on human review, especially of tests; AI‑written tests are often shallow, slow, or misleading.
- Concerns about non‑determinism: unlike compilers, LLMs can produce different implementations from the same spec, so “specs as the real code” is seen as unsafe without exhaustive tests.
Costs, incentives, and culture
- Heavy agent usage can cost thousands per month; some see this as worthwhile leverage, others as unjustifiable versus hiring another engineer.
- There’s anxiety about managers mandating AI use, measuring LOC, and forcing engineers to claim productivity gains.
- Skeptics worry about skill degradation, hidden technical debt in AI‑written codebases, and the lack of public, verifiable success examples at the claimed scale.