Experts Have World Models. LLMs Have Word Models

Language Models vs World Models

  • Many commenters agree LLMs are fundamentally trained on text/tokens, not reality itself, so they inherit both the strengths and distortions of language.
  • One camp argues: LLMs model “patterns in data that reflect the world,” so they do have (imperfect) world models, much like humans learn physics from textbooks.
  • The opposing camp insists: LLMs only see human-produced, lossy, biased representations; they therefore model “talk about the world,” not the world, and lack grounding or verification loops comparable to human interaction with reality.

Human Cognition, Embodiment, and Consciousness

  • Several argue humans have “privileged access” via consciousness and rich multimodal embodiment; we learn through action, feedback, and tacit skills not reducible to language.
  • Examples used: riding a bike, cooking, lab work, trash sorting, and advanced craftsmanship—domains where procedural, sensory, and tacit knowledge dominate.
  • Others respond that much abstract knowledge (math, physics) is already symbolic and not “felt,” questioning how strong this embodiment advantage really is.

Multimodality and Model Architecture

  • Some note modern systems are better described as large token or multimodal models (images, audio, video), not purely language models.
  • Critics counter that current multimodality is shallow and mostly one-way: text is used to label/interpret images, but visual/spatial structure rarely drives linguistic reasoning.
  • There is debate over whether internal “latent space” constitutes a real world model, or just higher-order token statistics.

Capabilities and Limits: Reasoning, Coding, Games

  • Supporters highlight LLM performance on physics problems, proofs (with tools), code debugging, and some chess/poker benchmarks as evidence of emergent modeling, not mere mimicry.
  • Skeptics stress persistent failures: weak spatial reasoning, poor real-world cooking advice, limited poker performance, and inability to autonomously run labs or handle evolving software requirements.
  • Programming is framed as “chess-like in the technical core but poker-like in the operational context”; LLMs may handle the former but struggle with shifting incentives and long-term consequences.

AGI, Efficiency, and Training Data

  • Some argue no “serious researchers” think pure LLM scaling leads to AGI; others cite researchers who do, noting lack of consensus.
  • There is broad agreement that next-token prediction is an inefficient route to rich world models, but disagreement on how inefficient relative to brains.
  • Many see future systems as agents with sub-models, tools, RL, and richer data (video, 3D, interaction), not standalone text predictors.

Alignment, Censorship, and Knowledge

  • A side thread discusses how alignment creates “subjective regulation of reality” and “variable access to facts,” especially on politically sensitive or identity-related topics.
  • Some see this as an inevitable collision between free inquiry and harm minimization; others worry about opaque, corporate-controlled gatekeeping of scientific and social knowledge.