There is an AI code review bubble
Scope of the “AI code review bubble”
- Many commenters agree there is a bubble: “everyone is shipping a code review agent,” often with thin differentiation.
- Several see code review as a feature that will be bundled into existing platforms (GitHub, GitLab, IDEs) rather than a standalone product category.
- Some argue most “AI code review startups” are just wrappers over the same few frontier models and are easy for model providers or platforms to subsume.
Greptile’s positioning and skepticism
- The article’s claims of “independence” (separate review agent from generator) and “autonomy” (fully automated validation) draw strong criticism:
- Models are trained on similar data, so “independence” is seen as mostly illusory.
- If review becomes truly autonomous, many believe it will just be a capability inside coding agents, not a separate product.
- Several readers say the post spends more time on philosophy than on concrete differentiation or benchmarks; some call it pure content marketing.
Effectiveness vs linters and humans
- Mixed but detailed anecdotes:
- Pro: Tools like Copilot, Bugbot, Claude, CodeRabbit, Unblocked, Cubic, etc. are reported to catch real bugs (race conditions, repeated logic across call boundaries, missing DB indexes, security issues) that linters and static analyzers missed.
- Contra: Others find them “pure noise,” catching trivial or impossible issues, misunderstanding language/library context, or arguing for pointless refactors.
- Recurrent theme: signal-to-noise is the central problem. Tools tend to:
- Overproduce speculative or nitpicky comments.
- Miss architectural or business-context issues while focusing on micro-level style or minor inefficiencies.
- Some commenters note that good prompting and customization per-codebase can dramatically improve usefulness.
Role and purpose of code review
- Many insist review is primarily about:
- Knowledge sharing, architecture, design, and maintainability.
- Spreading understanding of system evolution among teammates.
- Several argue: if you’re relying on AI review to “catch bugs,” you’re misusing PRs; tests, linters, and design should handle most defects.
- Others counter that AI review is a useful extra safety net, especially for solo devs or small teams, and is better than no review at all.
Autonomy, human-in-the-loop, and culture
- Strong pushback against visions of “vanishingly little human participation”:
- Concern that AI-generated and AI-reviewed code leads to large, poorly understood codebases and loss of engineering literacy.
- Emphasis that tests can’t catch everything; humans still needed for fitness-for-purpose, missed requirements, and long-term maintainability.
- Some describe desired tools as “assistants” or “wizards” that:
- Highlight areas humans should inspect.
- Minimize verbosity and nits, focusing on high-severity issues.
Economics, integration, and DIY
- Several note it’s trivial to:
- Pipe
git diffinto a frontier model via CLI, GitHub Actions, or custom pipelines. - Integrate review directly into IDEs or internal tooling using raw APIs.
- Pipe
- This leads to questions about what vendors really add beyond:
- Distribution/integration polish.
- Context management (e.g., cross-repo, DB schemas).
- Tuning for lower noise.
Trust, evaluation, and metrics
- Debate over what counts as “evidence” of effectiveness:
- Simple counts of “great catch” replies are criticized as insufficient without false-positive rates or comparisons vs. baselines.
- Some propose more rigorous evaluation (ROC-style analysis, controlled comparisons with expert reviewers and linters).
Human vs AI review friction
- Several report practical frustrations:
- AI overwriting PR descriptions, arguing with itself, or producing long, vague comments.
- Review fatigue from endless variable-name suggestions and hypothetical edge cases.
- Others say they now treat AI review like a powerful linter:
- Run on-demand, skim top-ranked issues, ignore the rest.
- Never a replacement, only a complement to human review and tests.