95% of generative AI pilots at companies are failing – MIT report

Why so many AI pilots fail

  • Enterprise data is messy: documents scattered across drives, inconsistent formats, weak internal search. Getting even basic retrieval right is hard; adding LLMs on top often just adds hallucinations.
  • Many “solutions” are thin wrappers over ChatGPT-style models, not deeply integrated into workflows or data. Users see little benefit beyond what they already get from generic tools.
  • LLMs get teams “80% there” quickly, but the last 20% (accuracy, edge cases, compliance, metrics) is a tar pit that kills adoption.
  • Business sponsors often don’t know what they want, can’t measure productivity gains, and underestimate the cost of changing processes and behavior.

Similarities to other tech/ERP failures

  • Commenters compare this to ERP rollouts: tech often works, but projects fail for business and social reasons (over-customization, unclear goals, budget overruns).
  • Many IT projects fail anyway; a 95% “no measurable P&L impact” rate is framed by some as normal experimentation, not unique to AI.

Limitations and appropriate domains

  • LLMs are seen as “text transformation machines” with limited real intelligence; their human-like language tricks people into overtrusting them.
  • They’re best where false positives/negatives are cheap and work is fuzzy: summarization, classification, drafting, “good enough” support, and internal search.
  • High-stakes, structured processes (accounting, HR, compliance) are far less tolerant of hallucinations; several doubt claims that back-office automation is the highest-ROI area.

Misaligned incentives and hype

  • Many deployments target sales/marketing and visible chatbots to impress executives and shareholders, not to solve real user problems.
  • Some staff quietly resist projects that seem aimed at replacing them, or simply can’t find any real way the tool helps beyond trivial tasks.
  • AI is described as the latest management fad: lots of “AI Mondays,” dashboards, and PR, little sustainable value.

Where value is actually emerging

  • Individual contributors report big productivity gains in software development and some creative/operational tasks.
  • Enterprise success stories cluster around “fancy search” over internal emails/docs, niche tools (e.g., jargon explanation), and specific workflow automations.
  • Several see this 5% success rate as a glass-half-full signal: a small but real set of high-value use cases amid a lot of hype-driven noise.