2026-02-08

Experts Have World Models. LLMs Have Word Models

Language Models vs World Models

Many commenters agree LLMs are fundamentally trained on text/tokens, not reality itself, so they inherit both the strengths and distortions of language.
One camp argues: LLMs model “patterns in data that reflect the world,” so they do have (imperfect) world models, much like humans learn physics from textbooks.
The opposing camp insists: LLMs only see human-produced, lossy, biased representations; they therefore model “talk about the world,” not the world, and lack grounding or verification loops comparable to human interaction with reality.

Human Cognition, Embodiment, and Consciousness

Several argue humans have “privileged access” via consciousness and rich multimodal embodiment; we learn through action, feedback, and tacit skills not reducible to language.
Examples used: riding a bike, cooking, lab work, trash sorting, and advanced craftsmanship—domains where procedural, sensory, and tacit knowledge dominate.
Others respond that much abstract knowledge (math, physics) is already symbolic and not “felt,” questioning how strong this embodiment advantage really is.

Multimodality and Model Architecture

Some note modern systems are better described as large token or multimodal models (images, audio, video), not purely language models.
Critics counter that current multimodality is shallow and mostly one-way: text is used to label/interpret images, but visual/spatial structure rarely drives linguistic reasoning.
There is debate over whether internal “latent space” constitutes a real world model, or just higher-order token statistics.

Capabilities and Limits: Reasoning, Coding, Games

Supporters highlight LLM performance on physics problems, proofs (with tools), code debugging, and some chess/poker benchmarks as evidence of emergent modeling, not mere mimicry.
Skeptics stress persistent failures: weak spatial reasoning, poor real-world cooking advice, limited poker performance, and inability to autonomously run labs or handle evolving software requirements.
Programming is framed as “chess-like in the technical core but poker-like in the operational context”; LLMs may handle the former but struggle with shifting incentives and long-term consequences.

AGI, Efficiency, and Training Data

Some argue no “serious researchers” think pure LLM scaling leads to AGI; others cite researchers who do, noting lack of consensus.
There is broad agreement that next-token prediction is an inefficient route to rich world models, but disagreement on how inefficient relative to brains.
Many see future systems as agents with sub-models, tools, RL, and richer data (video, 3D, interaction), not standalone text predictors.

Alignment, Censorship, and Knowledge

A side thread discusses how alignment creates “subjective regulation of reality” and “variable access to facts,” especially on politically sensitive or identity-related topics.
Some see this as an inevitable collision between free inquiry and harm minimization; others worry about opaque, corporate-controlled gatekeeping of scientific and social knowledge.

Related topics