2026-03-10

Levels of Agentic Engineering

Framing the “levels” model

Several commenters dislike the ladder framing; it implies “higher = better” and encourages gatekeeping and toxicity.
Some see the “levels” more as historical stages in the AI tooling ecosystem than as a personal skill ladder.
Alternative taxonomies (e.g., car-autonomy-inspired, simpler 2–5 level schemes) are mentioned as cleaner for communication.
A minimalist view: only two real modes – human-with-AI-assist vs AI-with-human-assist – with jokes about “AI with AI assist.”

Autonomous agents and “dark factories”

Curiosity and skepticism around fully autonomous “software factories” that generate large codebases with minimal human input.
Key challenge raised: if software can be fully delegated, why not sell the factory itself? Others reply that we’re not there yet, and that sales, marketing, and market fit remain unsolved by LLMs.
Some expect such factories to disrupt or “kill” much of traditional enterprise software; others argue internal enterprise software and regulatory checks will still demand human oversight.

Validation, quality, and context limits

Multiple comments argue that the real bottleneck is validation, not orchestration: producing 100× more code without 100× more validation harms quality.
Flaky tests, regulatory constraints, and subtle bugs (e.g., data persistence, crypto correctness) are cited as current blockers to full autonomy.
Long-running agents hit “context rot” and re-discover work; file-based persistent state and specs are proposed as pragmatic mitigations.

Capturing project knowledge and context

Strong focus on “context engineering”: CLAUDE.md-style rules, skills, ADRs, design docs, and structured commit messages.
Big gap identified between encoding what was done vs why; several patterns suggested (ADRs, contextual commits, typed prompt blocks).
Consensus that structured constraints and schemas significantly improve reliability over free-form instructions.

Real-world usage patterns and ergonomics

Reported successful setups: CI-based code review agents, microbenchmarking/performance agents, background harnesses, and manual triggering of “factories” for specific processes.
Multi-agent teams are powerful for some, but criticized for poor dev experience, high token burn, and fragile permission management.
Many developers still operate at “copy-paste into chat” or simple Chat IDE/CLI levels and find that effective and safer.

Human bottlenecks, communication, and hype

As agents get stronger, the bottleneck shifts from “how to build” to “what to build,” sequencing, and articulating requirements.
Some see voice as a useful way to dump rich context; others strongly prefer deliberate writing.
There is substantial skepticism about hype, money-making claims, and very high “levels”; several commenters report that LLMs are often “just” a much better search/autocomplete rather than a true dark factory today.

Related topics