Effects of Gen AI on High Skilled Work: Experiments with Software Developers

Productivity gains and where AI helps most

  • Many report 20–40% perceived productivity boosts; some claim 2–4x on simple tasks, others only 5–10%.
  • Biggest gains: boilerplate, CRUD, tests, bash scripts, CI configs, glue code, new or rarely-used languages/frameworks.
  • AI is praised for “interactive documentation”: surfacing APIs, idioms, jargon, and narrowing search before going to official docs.
  • Several devs say AI reduces procrastination and “toil,” making it easier to start tasks and keep momentum.

Juniors vs. seniors

  • Strong consensus that less-experienced devs see larger speedups and adopt AI more.
  • Seniors often find AI distracts on hard/novel problems where training data is thin and hallucinations are common.
  • Pattern described: juniors can ship more, but often don’t understand generated code, struggle to debug, and lean on “Copilot told me” in reviews.
  • Some seniors use AI mainly as an autocomplete or librarian; others see little net benefit and disable it.

Technical debt, code quality, and long‑term risk

  • Many worry short-term “more PRs” hides long-term costs: duplication, fragile patterns, subtle bugs, bad tests encoding wrong behavior.
  • Several report rejecting AI-heavy PRs where authors can’t explain changes; some orgs now block such PRs.
  • Concern that AI pushes everyone into maintaining poorly understood, “legacy-like” code and erodes shared mental models of systems.
  • Others counter that humans already produced terrible code; AI output is “no worse than entry-level” and at least has predictable failure modes.

Learning, deskilling, and developer growth

  • Repeated fear that juniors using AI for anything non-trivial will grow slower and become “AI-reliant” rather than “clueful.”
  • Others argue it’s analogous to Stack Overflow: motivated people still research and learn; AI can accelerate understanding by pointing to tools and patterns.
  • Several use AI explicitly as a teaching aid for infra, Linux, SQL, etc., while double-checking everything.

Study design and metrics skepticism

  • Multiple commenters critique the paper’s metrics (PRs, commits, builds) as poor proxies for real productivity or quality.
  • High variance, weak statistical significance, and Microsoft’s involvement are flagged as concerns.
  • Missing from the study: long-term effects on tech debt, maintainability, and developer skill.

Organizational and cultural factors

  • Many note that culture, reviews, and process determine whether AI use is beneficial or harmful.
  • Broader frustration appears about “deliver at all costs” incentives, weak documentation, and already-janky software quality that AI may amplify.