The maths you need to start understanding LLMs

Embeddings, RAG, and scope of the article

  • Several comments note the article’s math is essentially what you need for embeddings and RAG: turn text into vectors, use cosine distance to find relevant chunks, optionally rerank.
  • Others point out this is only the input stage; it doesn’t cover the full transformer/LLM, which has trillions of parameters and far more complexity.

What math you “need”

  • Common list: basic linear algebra, basic probability, some analysis (exp/softmax), gradients.
  • Some argue this is enough to start understanding LLMs (“necessary but not sufficient”), but not to fully understand training, optimization, or architecture design.
  • A few mention missing pieces like vector calculus, Hessians, and optimization theory.

Does doing the math equal understanding?

  • Debate over whether being able to write formulas or code PyTorch implies real understanding.
  • One view: formula use is the first step; deeper understanding comes from abstractions and analogies, and is effectively unbounded.
  • Others contrast ML with fields like elliptic-curve crypto, where derivations feel more “principled.”

Are LLMs just next-token predictors? World models vs parrots

  • One camp leans on “next-token predictor / stochastic parrot” as a useful high-level explanation for non‑technical audiences.
  • Another camp argues modern LLMs implicitly build internal models of the world and concepts, going beyond simple statistics.
  • There is pushback: LLMs only see text, not direct interaction with the world, so whatever “world model” they have is indirect and impoverished.
  • Some see “world model” claims as overblown, others see them as obvious given language models the world.

Simplicity of the math vs mystery of behavior

  • Repeated claim: at the micro-level it’s just additions, multiplications, matrix multiplies, activation functions, gradients.
  • The real puzzle is why these simple components, scaled up, work so well and exhibit emergent abilities; interpretability remains difficult.

How much math matters in practice

  • Some say most AI progress and LLM research is driven by scaling, data, engineering, and trial-and-error rather than deep new math.
  • Others insist solid math is crucial for serious research and for understanding architecture trade‑offs, even if most practitioners rely on libraries.
  • One thread criticizes focusing beginners on low-level math as a derailment; another counters that knowing LLMs are “just linear algebra” prevents magical thinking.

Uncertainty, logits, and chaining models

  • Interesting aside: viewing LLMs as logit (distribution) emitters highlights cumulative uncertainty when chaining multiple LLM calls or agents.
  • Reports of multi-step pipelines “collapsing” after a few hops motivate human-in-the-loop workflows or single-orchestrator designs.

Learning resources and backgrounds

  • Many recommendations: Karpathy’s videos, “from scratch” LLM books, deep learning texts, and structured math/ML courses.
  • Several people with physics/control-theory backgrounds note their old linear algebra and calculus training suddenly became directly useful for understanding LLMs.

Meta and title criticism

  • Discussion about HN’s cultural bias toward “math for AI” vs hypothetical “leetcode for AI.”
  • Some readers find the title misleading: the article explains the math used inside LLMs, but not the still‑developing mathematics that would explain why LLMs work in a rigorous, interpretable way.