The wall confronting large language models
Paper accessibility and author expertise
- Many commenters find the paper hard to read: heavy prose, dense equations, few concrete examples.
- Debate over whether the authors are “outside their core field”: some see computational physics/chemistry as relevant to ML; others view lack of LLM-building experience as a credibility issue.
- Meta‑discussion about gatekeeping: some argue ideas should stand on merit, others stress that bold claims from non‑practitioners deserve extra skepticism.
The “wall” and scaling of LLMs
- Several readers think core LLM quality gains have slowed despite massive spend, suggesting we may be near the top of an S‑curve.
- Others counter with business metrics (revenue growth) and argue the paper is about capability scaling, not value-for-money.
- Some expect future improvements more from agents, tools, and hybrid systems than from monolithic model scaling.
Markov chains, formal models, and expressivity
- One thread explores an “extensional equivalence” between LLMs and high‑order Markov chains.
- Critics say this equivalence is either trivial (any finite computation can be embedded in a huge Markov chain) or irrelevant to practical limits.
- Disagreement over whether such reductions actually constrain what transformers can do, or just restate that high‑dimensional probabilistic dynamics are very expressive.
Symbolic reasoning, backtracking, and Prolog
- A long subthread argues that probabilistic sequence models fundamentally lack capabilities like logical backtracking and Prolog‑style search.
- Others respond that backtracking can be simulated either inside the token stream or via external loops/tools; the bottleneck is practicality, not theoretical impossibility.
- Sudoku and Prolog interpreters are used as test cases; debate centers on whether “LLM + scaffolding” counts as the model doing the reasoning.
Turing completeness and “reasoning”
- Some argue that once an LLM is embedded in a simple loop, it becomes Turing complete; therefore there is no principled barrier to any computable reasoning.
- Opponents say this conflates mere computability with human‑like logical reasoning, invoking analogies to the Chinese Room and stressing reliability and traceability, not bare possibility.
Empirical limitations: math, logic, and hallucinations
- Multiple anecdotes show state‑of‑the‑art models still failing at basic arithmetic or producing correct answers via incorrect intermediate steps.
- This is taken by skeptics as evidence that “reasoning” is shallow pattern-matching; boosters reply that failures are mostly quantitative (error rates) and improvable.
- Some note that as long as outputs must be checked by humans or tools, applicability remains constrained—analogous to perpetually supervised self‑driving cars.
Brain comparisons and energy use
- The paper’s brain–LLM comparisons (synapses vs parameters, 20 W vs gigawatts) are criticized as superficial: humans could never ingest LLM training corpora, and inference energy per user is much lower than training.
- Others emphasize that, despite lower data and energy, humans still vastly outperform LLMs in flexible, grounded reasoning.
Critique of specific technical analogies
- The focus on floating‑point precision and discrete derivatives is questioned: commenters argue high‑dimensional optimization behaves differently than the paper suggests, and SGD’s success in such spaces is underappreciated.
- Repeated references to nuclear reactors and numerical analysis strike some readers as forced or only loosely connected to real LLM training dynamics.
Alternative directions and ML theory
- Some participants see the paper as broadly right in spirit—LLMs will hit walls on deeper reasoning—and are exploring symbolic, Bayesian, or neuro‑symbolic systems as complements.
- Others highlight a large but less visible body of ML theory and limits work; they worry hype around LLMs is crowding out more rigorous, long‑term lines of research.