2024-06-06

AI in software engineering at Google: Progress and the path ahead

Shift from Authoring to Reviewing

Several commenters echo Google’s observation: with AI suggestions, developers increasingly review and edit rather than write from scratch.
Some find this empowering, especially when working outside their specialty (e.g., backend devs producing React UIs).
Others argue reviewers rarely achieve the same depth of understanding as original authors, risking shallow comprehension of complex systems.

Learning, Expertise, and Gatekeeping

Strong debate on whether AI-assisted coding harms or helps learning.
One side: deep understanding comes from struggling through solutions; AI short‑circuits this and can feed Dunning–Kruger dynamics.
Other side: copying from LLMs is analogous to learning from Stack Overflow or tutorials; over time people rely less on it as they gain skill.
Some push back on “gatekeeping” attitudes that demand low‑level knowledge (e.g., transistors, CPU internals) for everyday coding.

Code Quality, Correctness, and Maintainability

Concern that syntactically correct but logically wrong or edge‑case‑fragile code will proliferate.
Review fatigue and “looks fine” acceptance are seen as risks, especially late in the day or among inexperienced reviewers.
Boilerplate generation is widely seen as a good fit, but there’s worry it may encourage bloated, repetitive code and weaker abstractions.

Metrics and Productivity Claims

Google’s “fraction of characters written by AI” (~50% of new code) and similar Copilot stats draw skepticism.
Critics say character share is a poor proxy for productivity or quality and fails to distinguish trivial boilerplate from hard logic.
Some note that even “accepted” suggestions may require heavy modification.

Google Internal Tools and Culture

Multiple Googlers/ex‑Googlers describe internal AI tools as powerful but uneven (good autocomplete, weak review suggestions).
Disagreement over whether AI usage is “force‑fed” or optional; some complain certain AI affordances can’t be fully disabled.
There is internal concern about overemphasizing AI metrics, but also acknowledgement that pre‑LLM ML autocomplete already existed.

Use Cases, UX, and Limits

Most positive experiences are: code completion, boilerplate, schema/unit‑test generation, refactors, and “design sounding board” chats.
Poor experiences include constant low‑quality suggestions in IDEs, hallucinated patterns, and lack of domain‑specific preferences.
Many see future gains coming more from better IDE integration, context awareness, and workflow design than from raw model gains.

Broader Concerns

Fears about IP contamination (e.g., AGPL snippets), privacy leaks, and over‑reliance on non‑deterministic tools.
Long‑term speculation ranges from “bulldozer‑style productivity boost” to potential job displacement and even autonomous corporations.

Related topics