Harness engineering: Leveraging Codex in an agent-first world

Scale and Throughput Claims

  • Many are struck by the claim: ~1M LOC, ~1,500 PRs, a small team, and “orders of magnitude” faster velocity.
  • Several compare this to large, mature projects (Firefox, Python stdlib) and find the LOC number implausible or at least suspicious as a bragging metric.
  • Some see it as a useful demo that agents can operate in a ~1M LOC codebase, not proof of product quality.

Code Quality, Maintainability, and Technical Debt

  • Strong skepticism that such rapid, agent-driven output can be clean or maintainable; fears of “spaghetti” and long‑term decay.
  • Others note the article claims systematic cleanup and debt paydown but remain unconvinced without repo access.
  • There’s concern that large, verbose, agent‑oriented codebases will be hard or pointless for humans to understand, pushing toward AI‑only maintenance.

Architecture, Harnesses, and Guardrails

  • Many focus on “harness engineering” as the real innovation: strict layering, import rules, CI checks, and deterministic tools (linters, tests, dependency rules).
  • Several commenters report similar setups: keeping all plans/docs/logs in‑repo, having agents update docs, heavy automated validation, and using domain‑driven or layered architectures.
  • Small files, low LOC, and good modularity are reported to significantly help agent performance and context usage.

Metrics and What LOC Really Means

  • Strong consensus that LOC is a poor productivity metric and encourages reward hacking.
  • Some argue it’s still a simple, communicable proxy to show “a lot was built,” especially for non‑technical audiences.
  • Others emphasize that good engineering should optimize for fewer, denser, more coherent lines and suggest better metrics (tests, reliability, feature correctness).

Economic and Labor Concerns

  • Some see this as an implicit signal that fewer engineers can ship more, threatening both junior and senior roles.
  • Others argue senior engineers remain valuable for architecture, harness design, and domain understanding; juniors may be hit hardest.
  • A minority dismisses the article as marketing and questions the real reliability, cost, and ROI of such agentic systems.

Adoption, Limits, and Cost

  • Multiple practitioners say they’ve tried similar “agent‑first” workflows; results vary from impressive to messy “vibe coding.”
  • Cost and token limits are cited as major blockers for fully autonomous approaches outside well‑funded environments.