2025-10-15

Just talk to it – A way of agentic engineering

Cognitive load & workflow with many agents

Some expect multiple concurrent agents to be exhausting; others report the opposite: parallel agents keep them in flow by eliminating waiting time.
Limiting factor is the “human context window”: tracking many threads and reviewing large diffs is harder than typing code.
Several describe agents as “maliciously compliant”: they wander off into tangents, take minutes for trivial tasks, and need frequent course correction.

Code quality, “AI slop,” and refactoring

Many report mediocre output: overly verbose code, 100-line tests that could be 20, 30-line implementations that could be 5, and frequent reintroduction of bugs.
Some claim high-quality output is possible and that “slop” is a sign of poor prompts or weak review; others strongly disagree, saying they almost always have to simplify AI code heavily.
There’s skepticism about letting the same agents that produced messy code “refactor” it; defenders note this is analogous to humans debugging their own bugs, provided tests and constraints exist.
LLMs are said to be good at pursuing a single, clearly defined objective (e.g., “pass these tests, keep diff under N characters”) but bad at balancing multiple competing goals or evolving design.

Scale, maintainability, and project realism

The cited 300k LOC AI-maintained codebase divides opinion: some see it as impressive; others call it “cowboy coding” and suspect much of it could be a tiny fraction if written thoughtfully.
Without full access to the closed-source project, commenters find it “unclear” how robust the system is (DB schema changes, migrations, auth/RBAC, performance).
Inspections of related public repos show heavy scaffolding, logging, and questionable tests, reinforcing doubts about maintainability at that size.

Tools, hooks, and guardrails

Claude Code “hooks” and similar features in other tools are praised for encoding process and policy: whitelisting dependencies, auto-approving/denying actions, and providing structured guidance beyond raw context.
Too many plugins/tools at once is seen as harmful: it burns context and confuses agents; recommended practice is enabling only task-specific tools.
Suggested guardrails: strong test suites, linters, benchmarks, explicit migration tools, diff-size limits, and treating agents as junior devs whose work always needs review.

Adoption gap, cost, and hype

Several readers feel inadequate compared to “AI writes 50–100% of my code” claims; others argue this is mostly marketing hyperbole and selective success stories.
Reported token spend (~$1,000/month) is debated: some note that on raw hourly cost this undercuts humans; others point out that a human is still needed for prompting, review, and decision-making.
There’s broad skepticism about “no-BS” narratives that rely on magical incantations, lots of Twitter links, and little concrete evidence of long-term, production-quality outcomes.

Changing developer roles & personal fit

With multiple agents, the role shifts toward managing synthetic teammates: planning, setting constraints, and reviewing PRs instead of writing every line.
Some experienced developers love this “agentic engineering,” saying it amplifies their architectural and design work; others find it stressful and fear a future of supervising untrusted code.
People report strong “personality” preferences: some click with Claude and clash with GPT-based tools, or vice versa, suggesting the UX and “disposition” of models matter as much as raw capability.

Related topics