Just talk to it – A way of agentic engineering

Cognitive load & workflow with many agents

  • Some expect multiple concurrent agents to be exhausting; others report the opposite: parallel agents keep them in flow by eliminating waiting time.
  • Limiting factor is the “human context window”: tracking many threads and reviewing large diffs is harder than typing code.
  • Several describe agents as “maliciously compliant”: they wander off into tangents, take minutes for trivial tasks, and need frequent course correction.

Code quality, “AI slop,” and refactoring

  • Many report mediocre output: overly verbose code, 100-line tests that could be 20, 30-line implementations that could be 5, and frequent reintroduction of bugs.
  • Some claim high-quality output is possible and that “slop” is a sign of poor prompts or weak review; others strongly disagree, saying they almost always have to simplify AI code heavily.
  • There’s skepticism about letting the same agents that produced messy code “refactor” it; defenders note this is analogous to humans debugging their own bugs, provided tests and constraints exist.
  • LLMs are said to be good at pursuing a single, clearly defined objective (e.g., “pass these tests, keep diff under N characters”) but bad at balancing multiple competing goals or evolving design.

Scale, maintainability, and project realism

  • The cited 300k LOC AI-maintained codebase divides opinion: some see it as impressive; others call it “cowboy coding” and suspect much of it could be a tiny fraction if written thoughtfully.
  • Without full access to the closed-source project, commenters find it “unclear” how robust the system is (DB schema changes, migrations, auth/RBAC, performance).
  • Inspections of related public repos show heavy scaffolding, logging, and questionable tests, reinforcing doubts about maintainability at that size.

Tools, hooks, and guardrails

  • Claude Code “hooks” and similar features in other tools are praised for encoding process and policy: whitelisting dependencies, auto-approving/denying actions, and providing structured guidance beyond raw context.
  • Too many plugins/tools at once is seen as harmful: it burns context and confuses agents; recommended practice is enabling only task-specific tools.
  • Suggested guardrails: strong test suites, linters, benchmarks, explicit migration tools, diff-size limits, and treating agents as junior devs whose work always needs review.

Adoption gap, cost, and hype

  • Several readers feel inadequate compared to “AI writes 50–100% of my code” claims; others argue this is mostly marketing hyperbole and selective success stories.
  • Reported token spend (~$1,000/month) is debated: some note that on raw hourly cost this undercuts humans; others point out that a human is still needed for prompting, review, and decision-making.
  • There’s broad skepticism about “no-BS” narratives that rely on magical incantations, lots of Twitter links, and little concrete evidence of long-term, production-quality outcomes.

Changing developer roles & personal fit

  • With multiple agents, the role shifts toward managing synthetic teammates: planning, setting constraints, and reviewing PRs instead of writing every line.
  • Some experienced developers love this “agentic engineering,” saying it amplifies their architectural and design work; others find it stressful and fear a future of supervising untrusted code.
  • People report strong “personality” preferences: some click with Claude and clash with GPT-based tools, or vice versa, suggesting the UX and “disposition” of models matter as much as raw capability.