The State of AI Coding Report 2025

Metrics & use of LOC

  • Central controversy: the report leads with lines of code (LOC) and “velocity” as evidence of AI-driven productivity, which many see as a discredited, even harmful metric.
  • Critics argue:
    • More LOC often means more complexity and debt; the best engineers often keep net LOC flat or negative.
    • Language like “productivity gains,” “output,” and “force multiplier” implicitly equates LOC with value, despite disclaimers.
    • LOC-based framing looks tailored to impress non-technical executives and undermines credibility.
  • Defenders and moderates say:
    • LOC is not “good,” but it is data and interesting as a measure of change.
    • As long as code is merged and used, higher merged LOC may loosely correlate with more real output, though with lots of noise.

Code quality, bugs, and maintainability

  • Multiple commenters say the real questions are:
    • Defect density, rollback/revert rates, and change failure rate.
    • Livesite/security incidents and MTTR (DORA-style metrics).
    • Long‑term maintainability of AI-generated code.
  • Suggestions for proxies:
    • Code churn and “change rate of new code” (how often a line is modified before stabilizing).
    • Cyclomatic complexity, coupling (how many files you must touch to understand/change something), “code entropy.”
  • Strong disagreement on “persistence” as a quality metric:
    • Some suggest stable, untouched code might be “good enough.”
    • Others note some of the worst, scariest legacy code persists precisely because no one dares touch it.
  • Greptile notes they track:
    • Change in number of revisions per PR before vs. after using their tool.
    • Fraction of their PR comments that result in code changes.
    • They admit this still doesn’t measure absolute quality.

Data scope & methodological concerns

  • Questions about which graphs are based on Greptile’s billion-LOC dataset vs public registries.
  • Clarification: early charts and one specific later chart are from Greptile’s customer data; others from npm/PyPI, etc.
  • Some found the “cross‑industry” and “internal team” wording confusing or borderline misleading.
  • Requests for:
    • Historical comparisons (past years) to distinguish trends from noise.
    • Breakdowns by company size, industry, and possibly revenue/feature-release correlations.
    • Metrics like change frequency per line, rollback rates, and deleted/replaced code.

Experiences with AI coding tools

  • Some report substantial personal speedups:
    • Using agents to “do the typing,” generating hundreds to thousands of LOC/day, with human review and tests.
    • Tools particularly helpful for pattern-spotting, boilerplate, web UIs, and small utilities/CLI tools.
  • Others remain skeptical:
    • Reviewing AI output is mentally taxing; you lose thinking time between manual steps.
    • AI code often contains many small issues or odd API usage; careful engineers find “hundreds of improvements.”
    • In complex, stateful or safety-critical systems (finance, core infra) they would not trust agent-driven large diffs.
  • Debate on equalization vs polarization:
    • Some hope AI will raise overall team output (classic “force multiplier” story).
    • Others expect it will amplify existing disparities: those who can keep up with rapid iteration will benefit most.

Impact on teams, business, and risk

  • Several commenters stress:
    • More LOC and larger PRs should be treated as risk indicators, not achievements.
    • Without tying metrics to incidents, bugs, and customer outcomes, “76% faster” could simply mean “76% faster at shipping debt.”
  • Some business-oriented perspectives:
    • Businesses crave simple productivity metrics even if imperfect; LOC appeals because it’s measurable.
    • However, a metric that can be gamed by doing bad work (e.g., adding useless code) is itself a productivity-measurement problem.

Perception of the report & presentation

  • Mixed reception:
    • Some find it “BS” or “revolving door of dumb” because it foregrounds LOC, seeing this as emblematic of AI hype and technical-debt generation.
    • Others appreciate having any quantitative data in a space dominated by anecdotes and say the graphs match their lived experience.
  • Design and UX of the site receive widespread praise (dot-matrix/paper styling, visual polish).
  • Several propose richer future analyses: language shifts under AI, typical script/PR sizes over time, proportion of code written by fully async agents, and how often AI-written code is later deleted or heavily rewritten.