The State of AI Coding Report 2025
Metrics & use of LOC
- Central controversy: the report leads with lines of code (LOC) and “velocity” as evidence of AI-driven productivity, which many see as a discredited, even harmful metric.
- Critics argue:
- More LOC often means more complexity and debt; the best engineers often keep net LOC flat or negative.
- Language like “productivity gains,” “output,” and “force multiplier” implicitly equates LOC with value, despite disclaimers.
- LOC-based framing looks tailored to impress non-technical executives and undermines credibility.
- Defenders and moderates say:
- LOC is not “good,” but it is data and interesting as a measure of change.
- As long as code is merged and used, higher merged LOC may loosely correlate with more real output, though with lots of noise.
Code quality, bugs, and maintainability
- Multiple commenters say the real questions are:
- Defect density, rollback/revert rates, and change failure rate.
- Livesite/security incidents and MTTR (DORA-style metrics).
- Long‑term maintainability of AI-generated code.
- Suggestions for proxies:
- Code churn and “change rate of new code” (how often a line is modified before stabilizing).
- Cyclomatic complexity, coupling (how many files you must touch to understand/change something), “code entropy.”
- Strong disagreement on “persistence” as a quality metric:
- Some suggest stable, untouched code might be “good enough.”
- Others note some of the worst, scariest legacy code persists precisely because no one dares touch it.
- Greptile notes they track:
- Change in number of revisions per PR before vs. after using their tool.
- Fraction of their PR comments that result in code changes.
- They admit this still doesn’t measure absolute quality.
Data scope & methodological concerns
- Questions about which graphs are based on Greptile’s billion-LOC dataset vs public registries.
- Clarification: early charts and one specific later chart are from Greptile’s customer data; others from npm/PyPI, etc.
- Some found the “cross‑industry” and “internal team” wording confusing or borderline misleading.
- Requests for:
- Historical comparisons (past years) to distinguish trends from noise.
- Breakdowns by company size, industry, and possibly revenue/feature-release correlations.
- Metrics like change frequency per line, rollback rates, and deleted/replaced code.
Experiences with AI coding tools
- Some report substantial personal speedups:
- Using agents to “do the typing,” generating hundreds to thousands of LOC/day, with human review and tests.
- Tools particularly helpful for pattern-spotting, boilerplate, web UIs, and small utilities/CLI tools.
- Others remain skeptical:
- Reviewing AI output is mentally taxing; you lose thinking time between manual steps.
- AI code often contains many small issues or odd API usage; careful engineers find “hundreds of improvements.”
- In complex, stateful or safety-critical systems (finance, core infra) they would not trust agent-driven large diffs.
- Debate on equalization vs polarization:
- Some hope AI will raise overall team output (classic “force multiplier” story).
- Others expect it will amplify existing disparities: those who can keep up with rapid iteration will benefit most.
Impact on teams, business, and risk
- Several commenters stress:
- More LOC and larger PRs should be treated as risk indicators, not achievements.
- Without tying metrics to incidents, bugs, and customer outcomes, “76% faster” could simply mean “76% faster at shipping debt.”
- Some business-oriented perspectives:
- Businesses crave simple productivity metrics even if imperfect; LOC appeals because it’s measurable.
- However, a metric that can be gamed by doing bad work (e.g., adding useless code) is itself a productivity-measurement problem.
Perception of the report & presentation
- Mixed reception:
- Some find it “BS” or “revolving door of dumb” because it foregrounds LOC, seeing this as emblematic of AI hype and technical-debt generation.
- Others appreciate having any quantitative data in a space dominated by anecdotes and say the graphs match their lived experience.
- Design and UX of the site receive widespread praise (dot-matrix/paper styling, visual polish).
- Several propose richer future analyses: language shifts under AI, typical script/PR sizes over time, proportion of code written by fully async agents, and how often AI-written code is later deleted or heavily rewritten.