2024-09-06

Effects of Gen AI on High Skilled Work: Experiments with Software Developers

Productivity gains and where AI helps most

Many report 20–40% perceived productivity boosts; some claim 2–4x on simple tasks, others only 5–10%.
Biggest gains: boilerplate, CRUD, tests, bash scripts, CI configs, glue code, new or rarely-used languages/frameworks.
AI is praised for “interactive documentation”: surfacing APIs, idioms, jargon, and narrowing search before going to official docs.
Several devs say AI reduces procrastination and “toil,” making it easier to start tasks and keep momentum.

Juniors vs. seniors

Strong consensus that less-experienced devs see larger speedups and adopt AI more.
Seniors often find AI distracts on hard/novel problems where training data is thin and hallucinations are common.
Pattern described: juniors can ship more, but often don’t understand generated code, struggle to debug, and lean on “Copilot told me” in reviews.
Some seniors use AI mainly as an autocomplete or librarian; others see little net benefit and disable it.

Technical debt, code quality, and long‑term risk

Many worry short-term “more PRs” hides long-term costs: duplication, fragile patterns, subtle bugs, bad tests encoding wrong behavior.
Several report rejecting AI-heavy PRs where authors can’t explain changes; some orgs now block such PRs.
Concern that AI pushes everyone into maintaining poorly understood, “legacy-like” code and erodes shared mental models of systems.
Others counter that humans already produced terrible code; AI output is “no worse than entry-level” and at least has predictable failure modes.

Learning, deskilling, and developer growth

Repeated fear that juniors using AI for anything non-trivial will grow slower and become “AI-reliant” rather than “clueful.”
Others argue it’s analogous to Stack Overflow: motivated people still research and learn; AI can accelerate understanding by pointing to tools and patterns.
Several use AI explicitly as a teaching aid for infra, Linux, SQL, etc., while double-checking everything.

Study design and metrics skepticism

Multiple commenters critique the paper’s metrics (PRs, commits, builds) as poor proxies for real productivity or quality.
High variance, weak statistical significance, and Microsoft’s involvement are flagged as concerns.
Missing from the study: long-term effects on tech debt, maintainability, and developer skill.

Organizational and cultural factors

Many note that culture, reviews, and process determine whether AI use is beneficial or harmful.
Broader frustration appears about “deliver at all costs” incentives, weak documentation, and already-janky software quality that AI may amplify.

Related topics