John Carmack talk at Upper Bound 2025
Scope and Setup of Carmack’s Project
- Built an Atari-playing physical robot using camera input and joystick actuators, trained online in real time on a laptop GPU.
- Emphasis is on generic methods, continual learning, sample efficiency, and robustness to physical issues (latency, noisy/“phantom” inputs, actuator wear), not just “solving Atari.”
- Some see it as a useful constrained testbed for problems that appear in robotics (real-time control, catastrophic forgetting); others argue similar work in simulation and robotics (e.g., by GPU/robotics vendors, self‑driving stacks) already addresses these.
Atari, RL, and Generalization
- Atari was historically a core RL benchmark and largely “solved” in emulators; multiple commenters argue that didn’t yield broadly useful, general algorithms.
- A line of criticism: individual Atari games are low‑dimensional; tiny models plus hand‑crafted tricks can do well, so “progress” often reflects researcher priors rather than genuine general intelligence.
- Counterpoint: revisiting Atari with realtime constraints, physical controllers, and multi‑game continuity remains valuable for studying transfer and catastrophic forgetting (game A performance shouldn't collapse after training on game B).
- Several note that humans rapidly transfer game concepts and UI patterns across games; current RL systems mostly do not.
Continuous Learning, Memory, and Human vs LLM Cognition
- Debate over the “missing ingredient”: proposals include continuous lifelong learning, better memory systems, and richer physical environments.
- One side stresses that humans constantly adapt, filter input, and retain key experiences over long timescales; current models largely don’t update weights online in this way.
- Others argue most impactful human memories are sparse “surprise/arousal” events, implying that a well‑designed persistent memory + context management system might suffice for many tasks.
- Skepticism that large context windows and vector DBs alone are enough for robust real‑world agents; issues with forgetting, retrieval, and lack of autonomous weight updates are highlighted.
Embodied Intelligence vs LLM “Blender” Pretraining
- Carmack explicitly contrasts learning from a stream of interactive experience with “throw‑everything‑in‑a‑blender” LLM pretraining.
- Some agree that embodied, interactive learning is crucial for AGI or for genuine concept formation and physical competence.
- Others note that frontier models are already multimodal (text, audio, images, video) and that massive pretraining plus RL in rich simulations may scale better than slow physical training.
- There’s concern that because pretraining is so effective and commercially valuable, interactive‑learning research may be underfunded despite its conceptual importance.
Carmack’s Role and Prospects
- Many express excitement and trust in his track record of doing more with less and extracting maximal performance from commodity hardware.
- Skeptics question whether past graphics/engine brilliance translates to leading AI research in a crowded, math‑heavy, hyper‑competitive field.
- Several suggest his biggest potential impact may be in systems, optimization, and tooling (e.g., more efficient GPU stacks) rather than novel learning theory per se.