The most underreported story in AI is that scaling has failed to produce AGI

Debate over Marcus’s Critique of Deep Learning

  • Some see him as a long‑time “deep learning can’t do real AI” voice whose goalposts keep moving and who selectively highlights failures.
  • Others argue his core criticisms (hallucinations, brittleness, hype) have largely held up and that his consistency is a strength, not a flaw.
  • He responds that his earlier “hitting a wall” predictions were mostly accurate and points to public prediction audits.
  • Several commenters wish he were less partisan in tone, but still value him as a counterweight to corporate hype.

Scaling, Plateau, and Hallucinations

  • Many agree that simple scaling has slowed in payoff since GPT‑3.5: hallucinations persist, reliability is limited, and “agents” work only in narrow domains.
  • Some claim hallucinations may be intrinsic to the current next‑token paradigm; others see them as reducible but not yet well controlled.
  • Counting letters in words (e.g., “strawberry”) is used as a toy example: critics see persistent failure as evidence of shallow pattern‑matching; defenders say it’s mostly an artifact of tokenization, not a fundamental limit.

LLMs, “Reasoning” Models, and Anthropomorphism

  • A minority reports an “inflection point” with new “thinking” models (e.g., chain‑of‑thought + RL search) that feel qualitatively different and more capable at stepwise reasoning.
  • Others insist these are still just stacked LLM calls and prompt‑search, not genuine reasoning or agency.
  • There’s recurring pushback on anthropomorphizing: chatbots are framed as fictional characters being “acted out” by a document‑completion machine.

Expectations for AGI and Theory

  • Multiple commenters note there is no solid theoretical argument that language models should yield AGI, only extrapolation and belief.
  • AGI enthusiasm is compared to quasi‑religious or crypto‑like hype; some see “building God” vibes among true believers.
  • Others argue simple systems layered at scale produced human intelligence, so dismissing next‑token predictors as “just statistics” is premature.

Cost, Usefulness, and Limits of the Approach

  • Critics emphasize petabyte‑scale data, massive GPU and power costs, and limited reliability relative to simple human skills as signs this path is inefficient and maybe fundamentally flawed.
  • Supporters reply that even replacing 5–10% of jobs or enabling narrow but reliable agents would be historically huge.

Meta: Polarization and Skepticism

  • The thread is seen as polarized into “AI hype train” vs “AI doom/hype skeptic” camps.
  • Some argue science should default to skepticism of grand claims; others worry that entrenched partisanship (on both sides) now dominates the discourse.