2025-07-10

Measuring the impact of AI on experienced open-source developer productivity

Study design & scope

16 experienced OSS maintainers from large, long-lived repos (~1M LOC, many years’ experience) completed 246 real issues.
Tasks were randomly assigned “AI allowed” vs “AI disallowed”; participants were paid ~~$150/hr (~~$73k total).
Primary tool was Cursor with Claude models; most devs had LLM experience but mixed familiarity with Cursor. Training on the tool was brief.
Several commenters think N=16 is small but note: if AI were truly 5–10× as claimed in hype, this setup should still detect it.

Core findings: slowdown vs “felt” speedup

Measured effect: AI use caused a significant slowdown on these tasks.
Yet developers predicted AI would speed them up ~24%, and even after experiencing slowdown, still believed they were ~20% faster.
Time breakdown (from figures discussed): less time actively coding / testing / researching; more time prompting, reading AI output, waiting, and being idle.
Many infer that reduced mental effort and coding time makes work feel faster even when wall-clock time increases.

Where AI seems helpful (anecdotal reports)

Strongly positive uses:
- Learning or working in unfamiliar languages, frameworks, or codebases.
- Boilerplate, small utilities, one-off scripts, config/CI/infra glue, refactors, type wrangling, test scaffolding.
- “Stack Overflow on steroids”: syntax, API usage, translations between languages, debugging help, rubber-ducking.
- Ops/sysadmin work: interpreting errors, reading manpages/docs and synthesizing commands.
Much weaker or negative:
- Deep work in code you know well.
- Complex, cross-cutting changes in large legacy systems.
- Letting agents “vibe-code” large features or whole systems.

Generalization & learning curve

Many argue results are specific to:
- Highly familiar repos with strict quality and implicit conventions.
- Short, well-scoped issues.
- Developers still climbing the “AI tooling + prompting + workflow” learning curve.
Others counter that prior positive studies also used relatively inexperienced users, suggesting current LLM coding value is narrower than marketed.

Open-source ecosystem effects

AI helps some maintainers keep up with tech debt, chores, dependency churn.
But OSS maintainers report:
- More low-quality, AI-generated PRs and code reviews creating review load not captured by the study.
- Contributors gaining résumé/“cred” without gaining real understanding of the codebase.
Debate over whether AI-boosted “trivial” contributions are net help or value extraction.

Trust, funding, and metrics

Some suspicion about funders and free compute from AI labs; organization states no direct payment from AI companies for evaluations.
Commenters stress:
- Self-reported productivity is unreliable; objective measurement is crucial.
- Speed per task is only one dimension; missing are quality, tech debt, long-term maintainability, institutional knowledge, and throughput across parallel tasks.
Many call for follow-up studies:
- With junior/mid devs, unfamiliar repos, greenfield projects, newer models/agents, and long-term outcomes.

Related topics