2025-03-10

Ask HN: Any insider takes on Yann LeCun's push against current architectures?

Perceived Limits of Current LLM Architectures

Many comments restate LeCun’s core critique as: autoregressive, token-by-token generation with fixed weights leads to error accumulation and makes systematic self-correction and “global” constraint satisfaction hard.
Others respond that transformers are Turing-complete and, in theory, can implement arbitrary algorithms and error correction; in practice, current training and inference setups don’t realize this reliably and require task‑specific “whack‑a‑mole” fixes.

Hallucinations, Uncertainty, and “I Don’t Know”

One camp claims transformers fundamentally lack a robust notion of uncertainty: they always pick a token, can’t “backtrack everything,” and don’t natively emit “I don’t know.”
Counter‑arguments:
- Models internally represent uncertainty as flat probability distributions and can be trained (via fine‑tuning or RL) to say “I don’t know” when they lack knowledge.
- Research shows hidden states encode “not knowing,” but standard QA fine‑tuning suppresses that expression.
Several propose architectural hacks: backspace tokens, explicit confidence heads per layer, branching/beam‑like generation, or self‑reflection frameworks (e.g., SelfRAG) to decide when to retrieve or abstain.
Others argue hallucinations are partly desirable creativity; the real issue is calibrating when outputs are guesses vs grounded facts.

Energy-Based Models, World Models, and LeCun’s Focus

Energy-based models (EBMs) are described as assigning low “energy” to globally consistent configurations, potentially enabling better uncertainty estimates and constraint satisfaction than token‑local probabilities.
LeCun’s broader agenda is seen as:
- Learning world models from rich, multimodal, interactive data (especially vision), not just text.
- Using energy minimization / JEPA‑like objectives to move away from pure memorization.
Practitioners note EBMs are currently far more resource‑intensive and not yet competitive at scale, though some groups are actively trying to change this.

Biological Plausibility, Efficiency, and Continual Learning

Many point to the brain’s ~25W energy use and continual, online learning as evidence current LLM training/inference is wildly inefficient and biologically implausible, implying large optimization headroom.
Others invoke the “bitter lesson”: biological plausibility isn’t necessarily a good design prior; compute‑heavy, simple methods often win.
Continual learning researchers say catastrophic forgetting is mostly solved in toy settings but hasn’t been pushed seriously at LLM scale; an architecture that can update itself in deployment without collapse is widely seen as necessary for longer‑term progress.

Alternative Architectures and Experimental Directions

Mentioned directions include:
- Diffusion language models (e.g., LLaDA/SEDD‑style) that sample whole sequences or blocks in parallel and may trade bandwidth for fewer steps.
- Sentence‑level or “concept” models that operate on higher‑level units than tokens.
- Recursive/branching “thought trees,” test‑time training, world‑model‑centric agents, and multi‑head predictive architectures like Hydra.
Several commenters think current transformers are a powerful but temporary step on an S‑curve; others suspect further scaling and better training schedules could still yield major surprises.

Economic and Social Path Dependence

There is broad agreement that industry incentives create strong path dependence:
- No major lab wants to ship something that’s weaker than current leaders on benchmarks.
- UX and integration matter more than marginal eval gains, so many promising but non‑dominant architectures (RWKV, Mamba‑like, EBMs, diffusion LMs) struggle to gain traction.
Overall, the thread reflects a split: some see LLMs as a dead‑end without new architectures; others view them as a flexible substrate that still has a lot of unexplored potential, with “energy minimization” more a re‑framing than a fundamentally different paradigm.

Related topics