Harness engineering: Leveraging Codex in an agent-first world
Scale and Throughput Claims
- Many are struck by the claim: ~1M LOC, ~1,500 PRs, a small team, and “orders of magnitude” faster velocity.
- Several compare this to large, mature projects (Firefox, Python stdlib) and find the LOC number implausible or at least suspicious as a bragging metric.
- Some see it as a useful demo that agents can operate in a ~1M LOC codebase, not proof of product quality.
Code Quality, Maintainability, and Technical Debt
- Strong skepticism that such rapid, agent-driven output can be clean or maintainable; fears of “spaghetti” and long‑term decay.
- Others note the article claims systematic cleanup and debt paydown but remain unconvinced without repo access.
- There’s concern that large, verbose, agent‑oriented codebases will be hard or pointless for humans to understand, pushing toward AI‑only maintenance.
Architecture, Harnesses, and Guardrails
- Many focus on “harness engineering” as the real innovation: strict layering, import rules, CI checks, and deterministic tools (linters, tests, dependency rules).
- Several commenters report similar setups: keeping all plans/docs/logs in‑repo, having agents update docs, heavy automated validation, and using domain‑driven or layered architectures.
- Small files, low LOC, and good modularity are reported to significantly help agent performance and context usage.
Metrics and What LOC Really Means
- Strong consensus that LOC is a poor productivity metric and encourages reward hacking.
- Some argue it’s still a simple, communicable proxy to show “a lot was built,” especially for non‑technical audiences.
- Others emphasize that good engineering should optimize for fewer, denser, more coherent lines and suggest better metrics (tests, reliability, feature correctness).
Economic and Labor Concerns
- Some see this as an implicit signal that fewer engineers can ship more, threatening both junior and senior roles.
- Others argue senior engineers remain valuable for architecture, harness design, and domain understanding; juniors may be hit hardest.
- A minority dismisses the article as marketing and questions the real reliability, cost, and ROI of such agentic systems.
Adoption, Limits, and Cost
- Multiple practitioners say they’ve tried similar “agent‑first” workflows; results vary from impressive to messy “vibe coding.”
- Cost and token limits are cited as major blockers for fully autonomous approaches outside well‑funded environments.