Experts Have World Models. LLMs Have Word Models
Language Models vs World Models
- Many commenters agree LLMs are fundamentally trained on text/tokens, not reality itself, so they inherit both the strengths and distortions of language.
- One camp argues: LLMs model “patterns in data that reflect the world,” so they do have (imperfect) world models, much like humans learn physics from textbooks.
- The opposing camp insists: LLMs only see human-produced, lossy, biased representations; they therefore model “talk about the world,” not the world, and lack grounding or verification loops comparable to human interaction with reality.
Human Cognition, Embodiment, and Consciousness
- Several argue humans have “privileged access” via consciousness and rich multimodal embodiment; we learn through action, feedback, and tacit skills not reducible to language.
- Examples used: riding a bike, cooking, lab work, trash sorting, and advanced craftsmanship—domains where procedural, sensory, and tacit knowledge dominate.
- Others respond that much abstract knowledge (math, physics) is already symbolic and not “felt,” questioning how strong this embodiment advantage really is.
Multimodality and Model Architecture
- Some note modern systems are better described as large token or multimodal models (images, audio, video), not purely language models.
- Critics counter that current multimodality is shallow and mostly one-way: text is used to label/interpret images, but visual/spatial structure rarely drives linguistic reasoning.
- There is debate over whether internal “latent space” constitutes a real world model, or just higher-order token statistics.
Capabilities and Limits: Reasoning, Coding, Games
- Supporters highlight LLM performance on physics problems, proofs (with tools), code debugging, and some chess/poker benchmarks as evidence of emergent modeling, not mere mimicry.
- Skeptics stress persistent failures: weak spatial reasoning, poor real-world cooking advice, limited poker performance, and inability to autonomously run labs or handle evolving software requirements.
- Programming is framed as “chess-like in the technical core but poker-like in the operational context”; LLMs may handle the former but struggle with shifting incentives and long-term consequences.
AGI, Efficiency, and Training Data
- Some argue no “serious researchers” think pure LLM scaling leads to AGI; others cite researchers who do, noting lack of consensus.
- There is broad agreement that next-token prediction is an inefficient route to rich world models, but disagreement on how inefficient relative to brains.
- Many see future systems as agents with sub-models, tools, RL, and richer data (video, 3D, interaction), not standalone text predictors.
Alignment, Censorship, and Knowledge
- A side thread discusses how alignment creates “subjective regulation of reality” and “variable access to facts,” especially on politically sensitive or identity-related topics.
- Some see this as an inevitable collision between free inquiry and harm minimization; others worry about opaque, corporate-controlled gatekeeping of scientific and social knowledge.