Can AI do maths yet? Thoughts from a mathematician

What “doing math” means in this context

  • Many distinguish “AI” from today’s LLMs: a system that can prove new theorems, reason geometrically, and understand quantities vs. a text predictor.
  • Several argue current LLMs can assist with math but are far from independently doing research-level mathematics or generating genuinely novel insights.
  • Others counter that prediction plus enough scale and tooling (code, solvers, proof assistants) may effectively amount to doing math in practice.

Current strengths and weaknesses

  • Reported strengths:
    • Explaining concepts, suggesting terminology, and pointing to relevant methods or literature.
    • Translating between informal math and formal systems (e.g., Lean) or between natural language and code.
    • Acting as a 24/7 “enthusiastic grad student” for brainstorming or checking steps.
  • Reported weaknesses:
    • Frequent basic errors: mis-multiplying, mismatched matrix dimensions, invalid inferences, failing simple counting/visual tasks.
    • Poor quantitative and geometric “common sense” relative to their apparent skill at formal or symbolic manipulations.
    • No reliable self-assessment of confidence; often confidently wrong.

Benchmarks, secrecy, and contamination

  • FrontierMath and similar “secret” datasets are central to claims about progress.
  • Commenters worry:
    • Test sets can leak into training via the public internet or through direct API use.
    • Closed models and undisclosed training corpora make honest evaluation hard.
  • Some expect future models to “magically” improve on these benchmarks once exposed, questioning their value as indicators of general reasoning.

Hybrid approaches and tooling

  • Strong interest in combining LLMs with:
    • Formal logic/verifier systems (Lean, Coq, automated reasoning frameworks).
    • External code execution (Python, CAS, theorem provers) as “calculators” and proof checkers.
  • View: LLMs as front-ends that translate human intent into formal objects that stricter tools can verify.

Foundational and theoretical questions

  • Discussion of limits from logic and complexity: undecidability in ZFC, double‑exponential proof lengths, parity/counting limitations, and whether LLMs can truly “reason” beyond training data.
  • Debate over whether neural networks merely imitate existing abstractions or can originate new ones.

Impact on mathematicians and work

  • Some fear erosion of the “art” and joy of hand-crafted proofs; others see AI as analogous to chess engines—changing practice but not eliminating human creativity.
  • Widespread uncertainty about long‑term job impacts, but many expect AI to be a powerful assistant rather than a full replacement in the near term.