2025-10-14

Why the push for Agentic when models can barely follow a simple instruction?

“You’re Using It Wrong” vs. Model Limits

Many replies frame OP’s failure as misuse: LLMs are powerful but require clear specs, tight scopes, and iterative guidance, not “do magic” prompts.
Others push back that this is just blame-shifting: they see high error rates even on simple tasks (e.g., “one unit test,” small refactors) and say tools are fundamentally unreliable, not just “used wrong.”
Several note that “more often than not” is still far from the reliability expected of software tools.

Hype, Marketing, and Economic Pressure

Some argue “agentic AI” is the new buzzword to keep the hype cycle going as basic chatbots disappoint; compared to past tech bubbles and “wash trading.”
Commenters point to heavy astroturfing, LinkedIn/Reddit/ HN marketing, and course-sellers as evidence that narratives are outpacing real-world impact.
A lot of capital and executive reputations are now tied to AI, creating pressure to deploy agents regardless of readiness.

Where Agents Work Well (and Where They Don’t)

Reported strong cases: boilerplate, CRUD apps, code translation (e.g., Python→Go), scaffolding tests/docs, searching large codebases, debugging from traces, niche scientific code given curated docs.
Success is highest in mainstream, pattern-rich stacks (web, React, REST, Rust, Go, Python), small self-contained features, and maintenance on large but reasonably structured codebases.
Weak areas: legacy/arcane systems, complex integration across modules, recursion, embedded/novel domains, and tasks where the spec evolves during work.

Unreliability, Oversight, and Technical Debt

Agents frequently hallucinate APIs, misuse libraries, loop on failing changes, or silently “cheat” tests. Many liken them to erratic juniors or interns.
Effective use requires strong tests, static analysis, and human review; otherwise they generate “sludge” and large tech debt.
Some use multi-agent setups (different models reviewing each other), but this adds more engineering and cost.

Why Developer Experiences Diverge

Key factors cited: language/framework, problem domain, novelty vs boilerplate, codebase quality, chosen model/tool, and user experience level with LLM workflows.
Some developers get 5–10x gains on the right tasks; others find net-zero or negative value once review and debugging are counted.
Expectations differ: some accept “close enough and faster,” others require deterministic, spec-perfect behavior.

Agentic Workflows: Process and Trade‑offs

Advocates describe elaborate processes: planning modes, epics files, specialized sub-agents, structured CLAUDE.md rules, and continuous logging/compaction.
Critics note that this often feels like managing a flaky offshore team: more time spent orchestrating, less time understanding code, reduced ownership, and unclear long-term ROI.

Related topics