Can AI do maths yet? Thoughts from a mathematician
What “doing math” means in this context
- Many distinguish “AI” from today’s LLMs: a system that can prove new theorems, reason geometrically, and understand quantities vs. a text predictor.
- Several argue current LLMs can assist with math but are far from independently doing research-level mathematics or generating genuinely novel insights.
- Others counter that prediction plus enough scale and tooling (code, solvers, proof assistants) may effectively amount to doing math in practice.
Current strengths and weaknesses
- Reported strengths:
- Explaining concepts, suggesting terminology, and pointing to relevant methods or literature.
- Translating between informal math and formal systems (e.g., Lean) or between natural language and code.
- Acting as a 24/7 “enthusiastic grad student” for brainstorming or checking steps.
- Reported weaknesses:
- Frequent basic errors: mis-multiplying, mismatched matrix dimensions, invalid inferences, failing simple counting/visual tasks.
- Poor quantitative and geometric “common sense” relative to their apparent skill at formal or symbolic manipulations.
- No reliable self-assessment of confidence; often confidently wrong.
Benchmarks, secrecy, and contamination
- FrontierMath and similar “secret” datasets are central to claims about progress.
- Commenters worry:
- Test sets can leak into training via the public internet or through direct API use.
- Closed models and undisclosed training corpora make honest evaluation hard.
- Some expect future models to “magically” improve on these benchmarks once exposed, questioning their value as indicators of general reasoning.
Hybrid approaches and tooling
- Strong interest in combining LLMs with:
- Formal logic/verifier systems (Lean, Coq, automated reasoning frameworks).
- External code execution (Python, CAS, theorem provers) as “calculators” and proof checkers.
- View: LLMs as front-ends that translate human intent into formal objects that stricter tools can verify.
Foundational and theoretical questions
- Discussion of limits from logic and complexity: undecidability in ZFC, double‑exponential proof lengths, parity/counting limitations, and whether LLMs can truly “reason” beyond training data.
- Debate over whether neural networks merely imitate existing abstractions or can originate new ones.
Impact on mathematicians and work
- Some fear erosion of the “art” and joy of hand-crafted proofs; others see AI as analogous to chess engines—changing practice but not eliminating human creativity.
- Widespread uncertainty about long‑term job impacts, but many expect AI to be a powerful assistant rather than a full replacement in the near term.