Ask HN: Do you have any evidence that agentic coding works?

What “agentic coding” means in this thread

  • Most distinguish between:
    • AI as an assistant (pair programmer / fast typist / junior dev).
    • AI as an agent (planning, editing, testing, committing with some autonomy).
  • Almost everyone agrees: fully autonomous, unreviewed agentic coding is unsafe; human review is mandatory.

Where it works well (reported evidence)

  • Boilerplate, glue code, CRUD apps, small CLIs, internal tools, “monkey work” refactors.
  • Porting code between languages/platforms (e.g. C→Go, Java→Laravel, backends, extensions).
  • Performance experiments and prototypes where correctness is easy to check and bad code can be rewritten.
  • Sysadmin/devops tasks via CLI tools and MCPs (querying services, investigating spikes, debugging).
  • Greenfield apps in domains heavily represented in open source, especially with clear specs.
  • Several users report 2–10x speedups on personal or medium-scale projects when tightly supervising agents.

Where it fails or becomes net‑negative

  • Large or complex systems requiring high‑level design, long‑term planning, or deep domain modeling.
  • Existing big codebases (especially huge monorepos) where keeping the agent “on the rails” is hard.
  • Situations where architecture, maintainability, and long‑term reasoning matter more than raw throughput.
  • Frontends and iOS are called out as particularly bumpy targets.

Testing, review, and quality concerns

  • Major failure mode: letting the AI write both implementation and tests (tests that always pass, “cheating” behaviors).
  • Code often works but is over‑engineered, messy, duplicated, or uses deprecated APIs.
  • Some argue speed outweighs technical debt (especially for throwaway/validation work); others stress that “debt must be paid,” if not by you then by someone else.

Workflows and practices that help

  • Use agents for small, well‑scoped changes; avoid broad, open‑ended tasks.
  • Always start with a plan; iterate on the plan before allowing code changes.
  • TDD-like loops: write/review tests as the spec, then let agents implement.
  • Maintain project docs for the agent (AGENTS.md/CLAUDE.md), logs, and explicit coding guidelines.
  • Use multiple passes/agents for review (e.g., “rule of 5” diverse reviews).
  • Abort and revert when the agent gets stuck or starts making subtle mistakes.

Scale, hype, and open questions

  • Big‑company anecdotes: heavy assisted use, but little evidence of fully agentic coding at scale.
  • Many see agentic coding as a powerful force multiplier if you already know what “good” looks like and treat it like managing a team of permanent juniors.
  • Others remain skeptical, calling much of the discourse FOMO‑driven marketing, with few reproducible, detailed success reports.