Effects of Gen AI on High Skilled Work: Experiments with Software Developers
Productivity gains and where AI helps most
- Many report 20–40% perceived productivity boosts; some claim 2–4x on simple tasks, others only 5–10%.
- Biggest gains: boilerplate, CRUD, tests, bash scripts, CI configs, glue code, new or rarely-used languages/frameworks.
- AI is praised for “interactive documentation”: surfacing APIs, idioms, jargon, and narrowing search before going to official docs.
- Several devs say AI reduces procrastination and “toil,” making it easier to start tasks and keep momentum.
Juniors vs. seniors
- Strong consensus that less-experienced devs see larger speedups and adopt AI more.
- Seniors often find AI distracts on hard/novel problems where training data is thin and hallucinations are common.
- Pattern described: juniors can ship more, but often don’t understand generated code, struggle to debug, and lean on “Copilot told me” in reviews.
- Some seniors use AI mainly as an autocomplete or librarian; others see little net benefit and disable it.
Technical debt, code quality, and long‑term risk
- Many worry short-term “more PRs” hides long-term costs: duplication, fragile patterns, subtle bugs, bad tests encoding wrong behavior.
- Several report rejecting AI-heavy PRs where authors can’t explain changes; some orgs now block such PRs.
- Concern that AI pushes everyone into maintaining poorly understood, “legacy-like” code and erodes shared mental models of systems.
- Others counter that humans already produced terrible code; AI output is “no worse than entry-level” and at least has predictable failure modes.
Learning, deskilling, and developer growth
- Repeated fear that juniors using AI for anything non-trivial will grow slower and become “AI-reliant” rather than “clueful.”
- Others argue it’s analogous to Stack Overflow: motivated people still research and learn; AI can accelerate understanding by pointing to tools and patterns.
- Several use AI explicitly as a teaching aid for infra, Linux, SQL, etc., while double-checking everything.
Study design and metrics skepticism
- Multiple commenters critique the paper’s metrics (PRs, commits, builds) as poor proxies for real productivity or quality.
- High variance, weak statistical significance, and Microsoft’s involvement are flagged as concerns.
- Missing from the study: long-term effects on tech debt, maintainability, and developer skill.
Organizational and cultural factors
- Many note that culture, reviews, and process determine whether AI use is beneficial or harmful.
- Broader frustration appears about “deliver at all costs” incentives, weak documentation, and already-janky software quality that AI may amplify.