When I reject AI code even if it works

When “working” code is still rejected

  • Many argue that “it runs and tests are green” is only a minimum bar.
  • Common rejection reasons: can’t explain the approach, diff is larger than the problem, premature abstractions, harder reasoning/maintainability, or trusting AI output more than one’s own understanding.
  • Several note this is identical to how good reviewers should treat human-written code (e.g., from juniors or contractors).

Risk, accountability, and domains

  • Higher scrutiny is demanded for safety‑critical or high‑value systems (payments, infrastructure, core backends) than for internal tools, hobby projects, or simple static sites.
  • On‑call and support expectations: if you can’t debug or explain the code at 3am, you shouldn’t ship it.
  • Some find it unethical to merge AI code they haven’t truly reviewed for critical systems.

Quality of AI-generated code

  • Frequent complaints: excessive abstractions, over‑engineering, duplicated utilities, unnecessary columns/fields, front‑end data munging instead of proper backend/DB use, and “code salad” in complex domains.
  • LLMs are described as increasing entropy, diverging rather than converging without strong constraints.
  • Subtle bugs (e.g., ML data leakage, accounting corner cases) are easy for AI to miss and hard for novices to detect.
  • Others counter that average agents already outperform many “enterprise” programmers on basic structure and cleanliness.

Workflows and guardrails

  • Some build elaborate harnesses: custom linters for “AI-isms,” strict pre‑commit checks, TDD enforced by scripts, multi‑model cross‑review, and periodic AI‑assisted audits.
  • Others use AI mainly as a pair‑programmer: plans, small snippets, syntax help, porting between languages/frameworks, or expanding comment-level pseudocode.
  • A few openly merge AI code they don’t fully understand, relying on tests and rapid reaction to failures; others label this as risky or irresponsible.

Organizational dynamics and incentives

  • Existing bad incentives—rewarding big, fast commits and cowboy behavior—are seen as amplified by AI, leading to faster tech debt accumulation.
  • Some foresee “software bankruptcy”–style failures, or at least slower delivery, high senior churn, and rewrite attempts that may also fail.

Middle ground and long-term concerns

  • Debate over whether a stable “middle ground” exists between no AI and full agentic development.
  • Worries about deskilling, comprehension debt, and future systems no one understands.
  • Others see AI as a “faster keyboard” and argue the real constraint remains human understanding and architecture, which must not be outsourced.