2025-02-20

The most underreported story in AI is that scaling has failed to produce AGI

Debate over Marcus’s Critique of Deep Learning

Some see him as a long‑time “deep learning can’t do real AI” voice whose goalposts keep moving and who selectively highlights failures.
Others argue his core criticisms (hallucinations, brittleness, hype) have largely held up and that his consistency is a strength, not a flaw.
He responds that his earlier “hitting a wall” predictions were mostly accurate and points to public prediction audits.
Several commenters wish he were less partisan in tone, but still value him as a counterweight to corporate hype.

Scaling, Plateau, and Hallucinations

Many agree that simple scaling has slowed in payoff since GPT‑3.5: hallucinations persist, reliability is limited, and “agents” work only in narrow domains.
Some claim hallucinations may be intrinsic to the current next‑token paradigm; others see them as reducible but not yet well controlled.
Counting letters in words (e.g., “strawberry”) is used as a toy example: critics see persistent failure as evidence of shallow pattern‑matching; defenders say it’s mostly an artifact of tokenization, not a fundamental limit.

LLMs, “Reasoning” Models, and Anthropomorphism

A minority reports an “inflection point” with new “thinking” models (e.g., chain‑of‑thought + RL search) that feel qualitatively different and more capable at stepwise reasoning.
Others insist these are still just stacked LLM calls and prompt‑search, not genuine reasoning or agency.
There’s recurring pushback on anthropomorphizing: chatbots are framed as fictional characters being “acted out” by a document‑completion machine.

Expectations for AGI and Theory

Multiple commenters note there is no solid theoretical argument that language models should yield AGI, only extrapolation and belief.
AGI enthusiasm is compared to quasi‑religious or crypto‑like hype; some see “building God” vibes among true believers.
Others argue simple systems layered at scale produced human intelligence, so dismissing next‑token predictors as “just statistics” is premature.

Cost, Usefulness, and Limits of the Approach

Critics emphasize petabyte‑scale data, massive GPU and power costs, and limited reliability relative to simple human skills as signs this path is inefficient and maybe fundamentally flawed.
Supporters reply that even replacing 5–10% of jobs or enabling narrow but reliable agents would be historically huge.

Meta: Polarization and Skepticism

The thread is seen as polarized into “AI hype train” vs “AI doom/hype skeptic” camps.
Some argue science should default to skepticism of grand claims; others worry that entrenched partisanship (on both sides) now dominates the discourse.

Related topics