2026-06-05

Harness engineering: Leveraging Codex in an agent-first world

Scale and Throughput Claims

Many are struck by the claim: ~1M LOC, ~1,500 PRs, a small team, and “orders of magnitude” faster velocity.
Several compare this to large, mature projects (Firefox, Python stdlib) and find the LOC number implausible or at least suspicious as a bragging metric.
Some see it as a useful demo that agents can operate in a ~1M LOC codebase, not proof of product quality.

Code Quality, Maintainability, and Technical Debt

Strong skepticism that such rapid, agent-driven output can be clean or maintainable; fears of “spaghetti” and long‑term decay.
Others note the article claims systematic cleanup and debt paydown but remain unconvinced without repo access.
There’s concern that large, verbose, agent‑oriented codebases will be hard or pointless for humans to understand, pushing toward AI‑only maintenance.

Architecture, Harnesses, and Guardrails

Many focus on “harness engineering” as the real innovation: strict layering, import rules, CI checks, and deterministic tools (linters, tests, dependency rules).
Several commenters report similar setups: keeping all plans/docs/logs in‑repo, having agents update docs, heavy automated validation, and using domain‑driven or layered architectures.
Small files, low LOC, and good modularity are reported to significantly help agent performance and context usage.

Metrics and What LOC Really Means

Strong consensus that LOC is a poor productivity metric and encourages reward hacking.
Some argue it’s still a simple, communicable proxy to show “a lot was built,” especially for non‑technical audiences.
Others emphasize that good engineering should optimize for fewer, denser, more coherent lines and suggest better metrics (tests, reliability, feature correctness).

Economic and Labor Concerns

Some see this as an implicit signal that fewer engineers can ship more, threatening both junior and senior roles.
Others argue senior engineers remain valuable for architecture, harness design, and domain understanding; juniors may be hit hardest.
A minority dismisses the article as marketing and questions the real reliability, cost, and ROI of such agentic systems.

Adoption, Limits, and Cost

Multiple practitioners say they’ve tried similar “agent‑first” workflows; results vary from impressive to messy “vibe coding.”
Cost and token limits are cited as major blockers for fully autonomous approaches outside well‑funded environments.

Related topics