Measuring the impact of AI on experienced open-source developer productivity
Study design & scope
- 16 experienced OSS maintainers from large, long-lived repos (~1M LOC, many years’ experience) completed 246 real issues.
- Tasks were randomly assigned “AI allowed” vs “AI disallowed”; participants were paid
$150/hr ($73k total). - Primary tool was Cursor with Claude models; most devs had LLM experience but mixed familiarity with Cursor. Training on the tool was brief.
- Several commenters think N=16 is small but note: if AI were truly 5–10× as claimed in hype, this setup should still detect it.
Core findings: slowdown vs “felt” speedup
- Measured effect: AI use caused a significant slowdown on these tasks.
- Yet developers predicted AI would speed them up ~24%, and even after experiencing slowdown, still believed they were ~20% faster.
- Time breakdown (from figures discussed): less time actively coding / testing / researching; more time prompting, reading AI output, waiting, and being idle.
- Many infer that reduced mental effort and coding time makes work feel faster even when wall-clock time increases.
Where AI seems helpful (anecdotal reports)
- Strongly positive uses:
- Learning or working in unfamiliar languages, frameworks, or codebases.
- Boilerplate, small utilities, one-off scripts, config/CI/infra glue, refactors, type wrangling, test scaffolding.
- “Stack Overflow on steroids”: syntax, API usage, translations between languages, debugging help, rubber-ducking.
- Ops/sysadmin work: interpreting errors, reading manpages/docs and synthesizing commands.
- Much weaker or negative:
- Deep work in code you know well.
- Complex, cross-cutting changes in large legacy systems.
- Letting agents “vibe-code” large features or whole systems.
Generalization & learning curve
- Many argue results are specific to:
- Highly familiar repos with strict quality and implicit conventions.
- Short, well-scoped issues.
- Developers still climbing the “AI tooling + prompting + workflow” learning curve.
- Others counter that prior positive studies also used relatively inexperienced users, suggesting current LLM coding value is narrower than marketed.
Open-source ecosystem effects
- AI helps some maintainers keep up with tech debt, chores, dependency churn.
- But OSS maintainers report:
- More low-quality, AI-generated PRs and code reviews creating review load not captured by the study.
- Contributors gaining résumé/“cred” without gaining real understanding of the codebase.
- Debate over whether AI-boosted “trivial” contributions are net help or value extraction.
Trust, funding, and metrics
- Some suspicion about funders and free compute from AI labs; organization states no direct payment from AI companies for evaluations.
- Commenters stress:
- Self-reported productivity is unreliable; objective measurement is crucial.
- Speed per task is only one dimension; missing are quality, tech debt, long-term maintainability, institutional knowledge, and throughput across parallel tasks.
- Many call for follow-up studies:
- With junior/mid devs, unfamiliar repos, greenfield projects, newer models/agents, and long-term outcomes.