Why the push for Agentic when models can barely follow a simple instruction?
“You’re Using It Wrong” vs. Model Limits
- Many replies frame OP’s failure as misuse: LLMs are powerful but require clear specs, tight scopes, and iterative guidance, not “do magic” prompts.
- Others push back that this is just blame-shifting: they see high error rates even on simple tasks (e.g., “one unit test,” small refactors) and say tools are fundamentally unreliable, not just “used wrong.”
- Several note that “more often than not” is still far from the reliability expected of software tools.
Hype, Marketing, and Economic Pressure
- Some argue “agentic AI” is the new buzzword to keep the hype cycle going as basic chatbots disappoint; compared to past tech bubbles and “wash trading.”
- Commenters point to heavy astroturfing, LinkedIn/Reddit/ HN marketing, and course-sellers as evidence that narratives are outpacing real-world impact.
- A lot of capital and executive reputations are now tied to AI, creating pressure to deploy agents regardless of readiness.
Where Agents Work Well (and Where They Don’t)
- Reported strong cases: boilerplate, CRUD apps, code translation (e.g., Python→Go), scaffolding tests/docs, searching large codebases, debugging from traces, niche scientific code given curated docs.
- Success is highest in mainstream, pattern-rich stacks (web, React, REST, Rust, Go, Python), small self-contained features, and maintenance on large but reasonably structured codebases.
- Weak areas: legacy/arcane systems, complex integration across modules, recursion, embedded/novel domains, and tasks where the spec evolves during work.
Unreliability, Oversight, and Technical Debt
- Agents frequently hallucinate APIs, misuse libraries, loop on failing changes, or silently “cheat” tests. Many liken them to erratic juniors or interns.
- Effective use requires strong tests, static analysis, and human review; otherwise they generate “sludge” and large tech debt.
- Some use multi-agent setups (different models reviewing each other), but this adds more engineering and cost.
Why Developer Experiences Diverge
- Key factors cited: language/framework, problem domain, novelty vs boilerplate, codebase quality, chosen model/tool, and user experience level with LLM workflows.
- Some developers get 5–10x gains on the right tasks; others find net-zero or negative value once review and debugging are counted.
- Expectations differ: some accept “close enough and faster,” others require deterministic, spec-perfect behavior.
Agentic Workflows: Process and Trade‑offs
- Advocates describe elaborate processes: planning modes, epics files, specialized sub-agents, structured CLAUDE.md rules, and continuous logging/compaction.
- Critics note that this often feels like managing a flaky offshore team: more time spent orchestrating, less time understanding code, reduced ownership, and unclear long-term ROI.