2025-12-17

The State of AI Coding Report 2025

Metrics & use of LOC

Central controversy: the report leads with lines of code (LOC) and “velocity” as evidence of AI-driven productivity, which many see as a discredited, even harmful metric.
Critics argue:
- More LOC often means more complexity and debt; the best engineers often keep net LOC flat or negative.
- Language like “productivity gains,” “output,” and “force multiplier” implicitly equates LOC with value, despite disclaimers.
- LOC-based framing looks tailored to impress non-technical executives and undermines credibility.
Defenders and moderates say:
- LOC is not “good,” but it is data and interesting as a measure of change.
- As long as code is merged and used, higher merged LOC may loosely correlate with more real output, though with lots of noise.

Code quality, bugs, and maintainability

Multiple commenters say the real questions are:
- Defect density, rollback/revert rates, and change failure rate.
- Livesite/security incidents and MTTR (DORA-style metrics).
- Long‑term maintainability of AI-generated code.
Suggestions for proxies:
- Code churn and “change rate of new code” (how often a line is modified before stabilizing).
- Cyclomatic complexity, coupling (how many files you must touch to understand/change something), “code entropy.”
Strong disagreement on “persistence” as a quality metric:
- Some suggest stable, untouched code might be “good enough.”
- Others note some of the worst, scariest legacy code persists precisely because no one dares touch it.
Greptile notes they track:
- Change in number of revisions per PR before vs. after using their tool.
- Fraction of their PR comments that result in code changes.
- They admit this still doesn’t measure absolute quality.

Data scope & methodological concerns

Questions about which graphs are based on Greptile’s billion-LOC dataset vs public registries.
Clarification: early charts and one specific later chart are from Greptile’s customer data; others from npm/PyPI, etc.
Some found the “cross‑industry” and “internal team” wording confusing or borderline misleading.
Requests for:
- Historical comparisons (past years) to distinguish trends from noise.
- Breakdowns by company size, industry, and possibly revenue/feature-release correlations.
- Metrics like change frequency per line, rollback rates, and deleted/replaced code.

Experiences with AI coding tools

Some report substantial personal speedups:
- Using agents to “do the typing,” generating hundreds to thousands of LOC/day, with human review and tests.
- Tools particularly helpful for pattern-spotting, boilerplate, web UIs, and small utilities/CLI tools.
Others remain skeptical:
- Reviewing AI output is mentally taxing; you lose thinking time between manual steps.
- AI code often contains many small issues or odd API usage; careful engineers find “hundreds of improvements.”
- In complex, stateful or safety-critical systems (finance, core infra) they would not trust agent-driven large diffs.
Debate on equalization vs polarization:
- Some hope AI will raise overall team output (classic “force multiplier” story).
- Others expect it will amplify existing disparities: those who can keep up with rapid iteration will benefit most.

Impact on teams, business, and risk

Several commenters stress:
- More LOC and larger PRs should be treated as risk indicators, not achievements.
- Without tying metrics to incidents, bugs, and customer outcomes, “76% faster” could simply mean “76% faster at shipping debt.”
Some business-oriented perspectives:
- Businesses crave simple productivity metrics even if imperfect; LOC appeals because it’s measurable.
- However, a metric that can be gamed by doing bad work (e.g., adding useless code) is itself a productivity-measurement problem.

Perception of the report & presentation

Mixed reception:
- Some find it “BS” or “revolving door of dumb” because it foregrounds LOC, seeing this as emblematic of AI hype and technical-debt generation.
- Others appreciate having any quantitative data in a space dominated by anecdotes and say the graphs match their lived experience.
Design and UX of the site receive widespread praise (dot-matrix/paper styling, visual polish).
Several propose richer future analyses: language shifts under AI, typical script/PR sizes over time, proportion of code written by fully async agents, and how often AI-written code is later deleted or heavily rewritten.

Related topics