Just talk to it – A way of agentic engineering
Cognitive load & workflow with many agents
- Some expect multiple concurrent agents to be exhausting; others report the opposite: parallel agents keep them in flow by eliminating waiting time.
- Limiting factor is the “human context window”: tracking many threads and reviewing large diffs is harder than typing code.
- Several describe agents as “maliciously compliant”: they wander off into tangents, take minutes for trivial tasks, and need frequent course correction.
Code quality, “AI slop,” and refactoring
- Many report mediocre output: overly verbose code, 100-line tests that could be 20, 30-line implementations that could be 5, and frequent reintroduction of bugs.
- Some claim high-quality output is possible and that “slop” is a sign of poor prompts or weak review; others strongly disagree, saying they almost always have to simplify AI code heavily.
- There’s skepticism about letting the same agents that produced messy code “refactor” it; defenders note this is analogous to humans debugging their own bugs, provided tests and constraints exist.
- LLMs are said to be good at pursuing a single, clearly defined objective (e.g., “pass these tests, keep diff under N characters”) but bad at balancing multiple competing goals or evolving design.
Scale, maintainability, and project realism
- The cited 300k LOC AI-maintained codebase divides opinion: some see it as impressive; others call it “cowboy coding” and suspect much of it could be a tiny fraction if written thoughtfully.
- Without full access to the closed-source project, commenters find it “unclear” how robust the system is (DB schema changes, migrations, auth/RBAC, performance).
- Inspections of related public repos show heavy scaffolding, logging, and questionable tests, reinforcing doubts about maintainability at that size.
Tools, hooks, and guardrails
- Claude Code “hooks” and similar features in other tools are praised for encoding process and policy: whitelisting dependencies, auto-approving/denying actions, and providing structured guidance beyond raw context.
- Too many plugins/tools at once is seen as harmful: it burns context and confuses agents; recommended practice is enabling only task-specific tools.
- Suggested guardrails: strong test suites, linters, benchmarks, explicit migration tools, diff-size limits, and treating agents as junior devs whose work always needs review.
Adoption gap, cost, and hype
- Several readers feel inadequate compared to “AI writes 50–100% of my code” claims; others argue this is mostly marketing hyperbole and selective success stories.
- Reported token spend (~$1,000/month) is debated: some note that on raw hourly cost this undercuts humans; others point out that a human is still needed for prompting, review, and decision-making.
- There’s broad skepticism about “no-BS” narratives that rely on magical incantations, lots of Twitter links, and little concrete evidence of long-term, production-quality outcomes.
Changing developer roles & personal fit
- With multiple agents, the role shifts toward managing synthetic teammates: planning, setting constraints, and reviewing PRs instead of writing every line.
- Some experienced developers love this “agentic engineering,” saying it amplifies their architectural and design work; others find it stressful and fear a future of supervising untrusted code.
- People report strong “personality” preferences: some click with Claude and clash with GPT-based tools, or vice versa, suggesting the UX and “disposition” of models matter as much as raw capability.