Diffusion for World Modeling

Overall reaction

  • Many find the demo striking and “dreamlike,” with some saying it’s the first paper in a while that makes them want new GPUs.
  • Others see it as mostly a cool proof‑of‑concept with limited direct usefulness in its current form.

Use in games and graphics

  • Some predict most game graphics will move to diffusion‑based rendering within a few years, enabling photorealism and “limitless physics.”
  • Skeptics argue entire games won’t be run by ML: engines need stable, debuggable rules, not “dream logic.”
  • More moderate views: ML is likely for subsystems—rendering, upscaling, animation, NPC behavior—rather than full game state.
  • Several see near‑term value as a “skin” or remaster layer over existing low‑fidelity games, similar in spirit to DLSS/RTX Remix.

World models, RL, and robotics

  • Commenters stress the real target is general world models for autonomous agents, not recreating Counter‑Strike.
  • Video game environments are used as cheap, controllable testbeds; the same methods could be trained on real‑world video + sensor data.
  • In RL, such models let agents “imagine” consequences instead of acting directly in the world.

Prediction vs understanding

  • Long subthread debates whether neural nets “only predict” or can “understand.”
  • One side equates scientific understanding with curve‑fitted predictive models; the other insists human‑style abstraction and generalization differs from current ML behavior.
  • Disagreements focus on conservation laws, historical scientific discovery, and whether future models could reach human‑level insight.

Limitations and technical concerns

  • Current model has poor long‑term consistency and almost no explicit map or state awareness; walking into walls or doing unusual actions produces plausible but wrong “gibberish.”
  • Memory is effectively just recent frames + inputs; world continuity and inventory/state tracking are weak.
  • Performance is heavy: high‑end GPUs, low resolution, and modest FPS.
  • For physics, some suggest ML approximations for complex phenomena (fluids, explosions, lighting), but others note determinism, debuggability, and multiplayer consistency concerns.

Dreamlike aesthetics and cognition parallels

  • Many note the uncanny, noisy, shifting visuals resemble dreams or psychedelic experiences.
  • Some speculate human dreams and perception might share structural similarities with diffusion‑style generative processes, though this remains speculative within the thread.