2025-10-02

What makes 5% of AI agents work in production?

Validity of the “5% of agents work” claim

Several commenters dispute the MIT study behind the “5% succeed” number, criticizing its reliance on perceived success rather than measured impact.
Some argue the paper and the blog treat agent capabilities naïvely (e.g., “self-improvement” via APIs) and conflate lack of integrations with model limitations.
Others note that if the study itself is weak, debating the exact percentage is meaningless.

LLMs vs decision trees and expert systems

Many production “agent” use cases (especially support) collapse into decision trees; LLMs are seen as poor replacements for deterministic logic.
Long prompts and “guardrails” are viewed as a reinvention of expert systems/decision trees with extra fragility and hallucination risk.
Some say once you’ve built strict parsers, validators, and post-processors, you’ve essentially implemented the business logic and could drop the LLM.

Scaffolding and context engineering

There is broad agreement that the hard part is not the model but the scaffolding: context selection, semantic layers, memory, governance, security.
One analogy: good “context engineering” resembles good management—providing intent and background so an agent (human or machine) can act effectively.
Some see this as simply “understanding the problem and engineering a solution,” not a new discipline.

Critique of the article and AI-written prose

Many readers feel the blog post itself was heavily AI-assisted and exhibits common “GPTisms” (tone, structure, clichés).
This triggers a larger debate about pride in work, quantity vs quality, and whether AI-assisted writing produces hollow, SEO-style content.
The author acknowledges using AI to polish a draft, which some accept as productivity, others see as undermining authenticity.

Text-to-SQL, semantic layers, and determinism

Text-to-SQL is repeatedly cited as a deceptively simple but very hard “hello world” for agents.
Successful teams reportedly add business glossaries, constrained templates, and validation layers.
Some argue better UX and predefined, verified metrics (“semantic business logic layers”) may be more robust than free-form SQL generation.

Conversational UIs, expectations, and “AI magic”

Conversational interfaces can reduce learning curves but often frustrate users during fine-tuning and edge cases, who then want traditional controls back.
Commenters note that AI is marketed as “magic,” leading non-technical stakeholders to expect effortless automation and insight.
There is speculation that in a few years, teams will optimize costs by replacing many agent workloads with simpler, non-AI systems.

Related topics