The wall confronting large language models

Paper accessibility and author expertise

  • Many commenters find the paper hard to read: heavy prose, dense equations, few concrete examples.
  • Debate over whether the authors are “outside their core field”: some see computational physics/chemistry as relevant to ML; others view lack of LLM-building experience as a credibility issue.
  • Meta‑discussion about gatekeeping: some argue ideas should stand on merit, others stress that bold claims from non‑practitioners deserve extra skepticism.

The “wall” and scaling of LLMs

  • Several readers think core LLM quality gains have slowed despite massive spend, suggesting we may be near the top of an S‑curve.
  • Others counter with business metrics (revenue growth) and argue the paper is about capability scaling, not value-for-money.
  • Some expect future improvements more from agents, tools, and hybrid systems than from monolithic model scaling.

Markov chains, formal models, and expressivity

  • One thread explores an “extensional equivalence” between LLMs and high‑order Markov chains.
  • Critics say this equivalence is either trivial (any finite computation can be embedded in a huge Markov chain) or irrelevant to practical limits.
  • Disagreement over whether such reductions actually constrain what transformers can do, or just restate that high‑dimensional probabilistic dynamics are very expressive.

Symbolic reasoning, backtracking, and Prolog

  • A long subthread argues that probabilistic sequence models fundamentally lack capabilities like logical backtracking and Prolog‑style search.
  • Others respond that backtracking can be simulated either inside the token stream or via external loops/tools; the bottleneck is practicality, not theoretical impossibility.
  • Sudoku and Prolog interpreters are used as test cases; debate centers on whether “LLM + scaffolding” counts as the model doing the reasoning.

Turing completeness and “reasoning”

  • Some argue that once an LLM is embedded in a simple loop, it becomes Turing complete; therefore there is no principled barrier to any computable reasoning.
  • Opponents say this conflates mere computability with human‑like logical reasoning, invoking analogies to the Chinese Room and stressing reliability and traceability, not bare possibility.

Empirical limitations: math, logic, and hallucinations

  • Multiple anecdotes show state‑of‑the‑art models still failing at basic arithmetic or producing correct answers via incorrect intermediate steps.
  • This is taken by skeptics as evidence that “reasoning” is shallow pattern-matching; boosters reply that failures are mostly quantitative (error rates) and improvable.
  • Some note that as long as outputs must be checked by humans or tools, applicability remains constrained—analogous to perpetually supervised self‑driving cars.

Brain comparisons and energy use

  • The paper’s brain–LLM comparisons (synapses vs parameters, 20 W vs gigawatts) are criticized as superficial: humans could never ingest LLM training corpora, and inference energy per user is much lower than training.
  • Others emphasize that, despite lower data and energy, humans still vastly outperform LLMs in flexible, grounded reasoning.

Critique of specific technical analogies

  • The focus on floating‑point precision and discrete derivatives is questioned: commenters argue high‑dimensional optimization behaves differently than the paper suggests, and SGD’s success in such spaces is underappreciated.
  • Repeated references to nuclear reactors and numerical analysis strike some readers as forced or only loosely connected to real LLM training dynamics.

Alternative directions and ML theory

  • Some participants see the paper as broadly right in spirit—LLMs will hit walls on deeper reasoning—and are exploring symbolic, Bayesian, or neuro‑symbolic systems as complements.
  • Others highlight a large but less visible body of ML theory and limits work; they worry hype around LLMs is crowding out more rigorous, long‑term lines of research.