AI solves International Math Olympiad problems at silver medal level
System design & methodology
- Problems were manually translated into Lean 4; the core system (AlphaProof) couples a pre‑trained language model with AlphaZero‑style reinforcement learning and Monte Carlo Tree Search over Lean proof states.
- A separate geometry system (AlphaGeometry 2) combines an LLM with a symbolic deduction engine and solved the geometry problem in ~19 seconds.
- During training and even during the contest, the system generated synthetic variants of problems and trained on proofs/disproofs of these, using the proof checker as a reward signal.
- Formal proofs are verified by Lean, so end results are mechanically checkable and non‑hallucinated.
Performance vs IMO rules
- System solved 4/6 problems for 28/42 points, described as “silver medal level”.
- Humans get 2 × 4.5‑hour sessions; the system had up to ~3 days per problem and massive compute. One problem was solved “within minutes”.
- Many commenters argue that on an apples‑to‑apples timing and tooling basis it would not medal; others say the core result is that these problems are now solvable at all by machines.
Role and difficulty of formalization
- Input statements were hand‑formalized; answers (e.g., which numbers/sets satisfy conditions) were discovered by the system, then proved.
- There is disagreement on whether formalization is “much easier” than solving; some say any trained student can do it quickly, others report specific problems (like P5) are quite hard to formalize.
- An autoformalizer based on Gemini exists and was used to generate large training corpora, but is not yet reliable enough for high‑stakes input; there is no general way to automatically verify that a formalization matches the informal problem.
Search, reasoning, and “intelligence”
- Many see this as another victory for search + learning (AlphaZero‑style) over huge spaces, not pure brute force.
- Debate centers on whether this is “just” guided search vs genuine reasoning; some equate this with how human mathematical research feels (groping in the dark, many failed paths).
- Combinatorics problems remained unsolved; several speculate these are harder for current methods, more akin to general reasoning tasks.
Implications for math and software
- Strong optimism that such systems will become powerful mathematical assistants: checking arguments, exploring variations, automating tedious inequalities, and eventually tackling some open problems.
- Similar techniques are seen as promising for verified software (concurrency protocols, compiler correctness, program synthesis from specs).
Concerns, limitations, and skepticism
- Multiple commenters criticize marketing: “silver medal” framing ignores time/computational asymmetry and heavy manual formalization.
- Worries about AI hype, media overclaiming (“move over mathematicians”), and DeepMind’s closed, non‑reproducible systems.
- Energy and CO₂ cost is raised; some want transparent reporting of training and inference footprints.
- Others counter that inefficiency and long runtimes are typical for first breakthroughs and will likely improve dramatically.