2025-05-23

John Carmack talk at Upper Bound 2025

Scope and Setup of Carmack’s Project

Built an Atari-playing physical robot using camera input and joystick actuators, trained online in real time on a laptop GPU.
Emphasis is on generic methods, continual learning, sample efficiency, and robustness to physical issues (latency, noisy/“phantom” inputs, actuator wear), not just “solving Atari.”
Some see it as a useful constrained testbed for problems that appear in robotics (real-time control, catastrophic forgetting); others argue similar work in simulation and robotics (e.g., by GPU/robotics vendors, self‑driving stacks) already addresses these.

Atari, RL, and Generalization

Atari was historically a core RL benchmark and largely “solved” in emulators; multiple commenters argue that didn’t yield broadly useful, general algorithms.
A line of criticism: individual Atari games are low‑dimensional; tiny models plus hand‑crafted tricks can do well, so “progress” often reflects researcher priors rather than genuine general intelligence.
Counterpoint: revisiting Atari with realtime constraints, physical controllers, and multi‑game continuity remains valuable for studying transfer and catastrophic forgetting (game A performance shouldn't collapse after training on game B).
Several note that humans rapidly transfer game concepts and UI patterns across games; current RL systems mostly do not.

Continuous Learning, Memory, and Human vs LLM Cognition

Debate over the “missing ingredient”: proposals include continuous lifelong learning, better memory systems, and richer physical environments.
One side stresses that humans constantly adapt, filter input, and retain key experiences over long timescales; current models largely don’t update weights online in this way.
Others argue most impactful human memories are sparse “surprise/arousal” events, implying that a well‑designed persistent memory + context management system might suffice for many tasks.
Skepticism that large context windows and vector DBs alone are enough for robust real‑world agents; issues with forgetting, retrieval, and lack of autonomous weight updates are highlighted.

Embodied Intelligence vs LLM “Blender” Pretraining

Carmack explicitly contrasts learning from a stream of interactive experience with “throw‑everything‑in‑a‑blender” LLM pretraining.
Some agree that embodied, interactive learning is crucial for AGI or for genuine concept formation and physical competence.
Others note that frontier models are already multimodal (text, audio, images, video) and that massive pretraining plus RL in rich simulations may scale better than slow physical training.
There’s concern that because pretraining is so effective and commercially valuable, interactive‑learning research may be underfunded despite its conceptual importance.

Carmack’s Role and Prospects

Many express excitement and trust in his track record of doing more with less and extracting maximal performance from commodity hardware.
Skeptics question whether past graphics/engine brilliance translates to leading AI research in a crowded, math‑heavy, hyper‑competitive field.
Several suggest his biggest potential impact may be in systems, optimization, and tooling (e.g., more efficient GPU stacks) rather than novel learning theory per se.

Related topics