2024-12-23

Can AI do maths yet? Thoughts from a mathematician

What “doing math” means in this context

Many distinguish “AI” from today’s LLMs: a system that can prove new theorems, reason geometrically, and understand quantities vs. a text predictor.
Several argue current LLMs can assist with math but are far from independently doing research-level mathematics or generating genuinely novel insights.
Others counter that prediction plus enough scale and tooling (code, solvers, proof assistants) may effectively amount to doing math in practice.

Current strengths and weaknesses

Reported strengths:
- Explaining concepts, suggesting terminology, and pointing to relevant methods or literature.
- Translating between informal math and formal systems (e.g., Lean) or between natural language and code.
- Acting as a 24/7 “enthusiastic grad student” for brainstorming or checking steps.
Reported weaknesses:
- Frequent basic errors: mis-multiplying, mismatched matrix dimensions, invalid inferences, failing simple counting/visual tasks.
- Poor quantitative and geometric “common sense” relative to their apparent skill at formal or symbolic manipulations.
- No reliable self-assessment of confidence; often confidently wrong.

Benchmarks, secrecy, and contamination

FrontierMath and similar “secret” datasets are central to claims about progress.
Commenters worry:
- Test sets can leak into training via the public internet or through direct API use.
- Closed models and undisclosed training corpora make honest evaluation hard.
Some expect future models to “magically” improve on these benchmarks once exposed, questioning their value as indicators of general reasoning.

Hybrid approaches and tooling

Strong interest in combining LLMs with:
- Formal logic/verifier systems (Lean, Coq, automated reasoning frameworks).
- External code execution (Python, CAS, theorem provers) as “calculators” and proof checkers.
View: LLMs as front-ends that translate human intent into formal objects that stricter tools can verify.

Foundational and theoretical questions

Discussion of limits from logic and complexity: undecidability in ZFC, double‑exponential proof lengths, parity/counting limitations, and whether LLMs can truly “reason” beyond training data.
Debate over whether neural networks merely imitate existing abstractions or can originate new ones.

Impact on mathematicians and work

Some fear erosion of the “art” and joy of hand-crafted proofs; others see AI as analogous to chess engines—changing practice but not eliminating human creativity.
Widespread uncertainty about long‑term job impacts, but many expect AI to be a powerful assistant rather than a full replacement in the near term.

Related topics