Measuring the impact of AI on experienced open-source developer productivity

Study design & scope

  • 16 experienced OSS maintainers from large, long-lived repos (~1M LOC, many years’ experience) completed 246 real issues.
  • Tasks were randomly assigned “AI allowed” vs “AI disallowed”; participants were paid $150/hr ($73k total).
  • Primary tool was Cursor with Claude models; most devs had LLM experience but mixed familiarity with Cursor. Training on the tool was brief.
  • Several commenters think N=16 is small but note: if AI were truly 5–10× as claimed in hype, this setup should still detect it.

Core findings: slowdown vs “felt” speedup

  • Measured effect: AI use caused a significant slowdown on these tasks.
  • Yet developers predicted AI would speed them up ~24%, and even after experiencing slowdown, still believed they were ~20% faster.
  • Time breakdown (from figures discussed): less time actively coding / testing / researching; more time prompting, reading AI output, waiting, and being idle.
  • Many infer that reduced mental effort and coding time makes work feel faster even when wall-clock time increases.

Where AI seems helpful (anecdotal reports)

  • Strongly positive uses:
    • Learning or working in unfamiliar languages, frameworks, or codebases.
    • Boilerplate, small utilities, one-off scripts, config/CI/infra glue, refactors, type wrangling, test scaffolding.
    • “Stack Overflow on steroids”: syntax, API usage, translations between languages, debugging help, rubber-ducking.
    • Ops/sysadmin work: interpreting errors, reading manpages/docs and synthesizing commands.
  • Much weaker or negative:
    • Deep work in code you know well.
    • Complex, cross-cutting changes in large legacy systems.
    • Letting agents “vibe-code” large features or whole systems.

Generalization & learning curve

  • Many argue results are specific to:
    • Highly familiar repos with strict quality and implicit conventions.
    • Short, well-scoped issues.
    • Developers still climbing the “AI tooling + prompting + workflow” learning curve.
  • Others counter that prior positive studies also used relatively inexperienced users, suggesting current LLM coding value is narrower than marketed.

Open-source ecosystem effects

  • AI helps some maintainers keep up with tech debt, chores, dependency churn.
  • But OSS maintainers report:
    • More low-quality, AI-generated PRs and code reviews creating review load not captured by the study.
    • Contributors gaining résumé/“cred” without gaining real understanding of the codebase.
  • Debate over whether AI-boosted “trivial” contributions are net help or value extraction.

Trust, funding, and metrics

  • Some suspicion about funders and free compute from AI labs; organization states no direct payment from AI companies for evaluations.
  • Commenters stress:
    • Self-reported productivity is unreliable; objective measurement is crucial.
    • Speed per task is only one dimension; missing are quality, tech debt, long-term maintainability, institutional knowledge, and throughput across parallel tasks.
  • Many call for follow-up studies:
    • With junior/mid devs, unfamiliar repos, greenfield projects, newer models/agents, and long-term outcomes.