2025-04-04

DeepMind program finds diamonds in Minecraft without being taught

Publication, setup, and demos

Some readers initially thought this was about older DreamerV3 work and noted the lag between the 2023 arXiv paper and the 2025 Nature publication.
The demo videos confused people at first (e.g., one clip appears to just dig and then fall into lava), but others pointed out where diamonds are actually acquired and that the tools are hard to see due to timelapse and low resolution.

World models and interpretability

Central interest: Dreamer builds a learned “world model,” then imagines future trajectories to decide actions.
Several comments ask whether this world model is inspectable like an AV stack, or only as opaque weights.
Replies describe it as a latent state representation, with imagined futures that can be decoded back into low-res videos (shown in the paper), not a human-readable symbolic state machine.
Broader debate: whether such internal structures justify using cognitive/neuroscience terms, and whether interpretability work truly shows reasoning vs sophisticated pattern matching.

Reward design, “teaching,” and caveats

Dreamer gets +1 rewards for each of 12 intermediate items (log → plank → stick → … → iron pickaxe → diamond).
Some argue this is still “being taught” via a handcrafted curriculum, making the article’s “without being told what to do” framing and headline somewhat misleading.
Others counter that curriculum and reward engineering are intrinsic to RL, and humans also benefit from shaped feedback and prior knowledge.
An important implementation caveat: block-breaking is accelerated so the agent doesn’t have to learn to hold a button for hundreds of steps. Opinions differ on whether this is a minor engineering tweak or evidence of algorithmic weakness.

Significance of the Minecraft result

Supportive voices emphasize Minecraft’s large, open-ended state space; learning a multi-step, long-horizon plan from sparse rewards and pixels alone is seen as a substantial RL/world-model advance.
Skeptics argue that “finding diamonds” is a very limited slice of the game and far from “mastery,” suggesting more human-like goals (bases, farms, complex builds) as more meaningful benchmarks.

RL, real-world applicability, and inputs

Recurring theme: RL successes in games hinge on clear, dense or well-shaped rewards; real-world tasks have fuzzier goals and delayed feedback, making direct transfer hard.
Some note promising robotics work but question why past “breakthrough” RL demos have not translated into robust, widely deployed systems.
There’s disagreement over training from pixels: critics suggest using structured internal game state, while defenders argue pixel-based learning is closer to the vision-first constraint of real-world agents, even if biology likely uses intermediate compressions.

Related topics