What makes 5% of AI agents work in production?
Validity of the “5% of agents work” claim
- Several commenters dispute the MIT study behind the “5% succeed” number, criticizing its reliance on perceived success rather than measured impact.
- Some argue the paper and the blog treat agent capabilities naïvely (e.g., “self-improvement” via APIs) and conflate lack of integrations with model limitations.
- Others note that if the study itself is weak, debating the exact percentage is meaningless.
LLMs vs decision trees and expert systems
- Many production “agent” use cases (especially support) collapse into decision trees; LLMs are seen as poor replacements for deterministic logic.
- Long prompts and “guardrails” are viewed as a reinvention of expert systems/decision trees with extra fragility and hallucination risk.
- Some say once you’ve built strict parsers, validators, and post-processors, you’ve essentially implemented the business logic and could drop the LLM.
Scaffolding and context engineering
- There is broad agreement that the hard part is not the model but the scaffolding: context selection, semantic layers, memory, governance, security.
- One analogy: good “context engineering” resembles good management—providing intent and background so an agent (human or machine) can act effectively.
- Some see this as simply “understanding the problem and engineering a solution,” not a new discipline.
Critique of the article and AI-written prose
- Many readers feel the blog post itself was heavily AI-assisted and exhibits common “GPTisms” (tone, structure, clichés).
- This triggers a larger debate about pride in work, quantity vs quality, and whether AI-assisted writing produces hollow, SEO-style content.
- The author acknowledges using AI to polish a draft, which some accept as productivity, others see as undermining authenticity.
Text-to-SQL, semantic layers, and determinism
- Text-to-SQL is repeatedly cited as a deceptively simple but very hard “hello world” for agents.
- Successful teams reportedly add business glossaries, constrained templates, and validation layers.
- Some argue better UX and predefined, verified metrics (“semantic business logic layers”) may be more robust than free-form SQL generation.
Conversational UIs, expectations, and “AI magic”
- Conversational interfaces can reduce learning curves but often frustrate users during fine-tuning and edge cases, who then want traditional controls back.
- Commenters note that AI is marketed as “magic,” leading non-technical stakeholders to expect effortless automation and insight.
- There is speculation that in a few years, teams will optimize costs by replacing many agent workloads with simpler, non-AI systems.