2025-09-02

The maths you need to start understanding LLMs

Embeddings, RAG, and scope of the article

Several comments note the article’s math is essentially what you need for embeddings and RAG: turn text into vectors, use cosine distance to find relevant chunks, optionally rerank.
Others point out this is only the input stage; it doesn’t cover the full transformer/LLM, which has trillions of parameters and far more complexity.

What math you “need”

Common list: basic linear algebra, basic probability, some analysis (exp/softmax), gradients.
Some argue this is enough to start understanding LLMs (“necessary but not sufficient”), but not to fully understand training, optimization, or architecture design.
A few mention missing pieces like vector calculus, Hessians, and optimization theory.

Does doing the math equal understanding?

Debate over whether being able to write formulas or code PyTorch implies real understanding.
One view: formula use is the first step; deeper understanding comes from abstractions and analogies, and is effectively unbounded.
Others contrast ML with fields like elliptic-curve crypto, where derivations feel more “principled.”

Are LLMs just next-token predictors? World models vs parrots

One camp leans on “next-token predictor / stochastic parrot” as a useful high-level explanation for non‑technical audiences.
Another camp argues modern LLMs implicitly build internal models of the world and concepts, going beyond simple statistics.
There is pushback: LLMs only see text, not direct interaction with the world, so whatever “world model” they have is indirect and impoverished.
Some see “world model” claims as overblown, others see them as obvious given language models the world.

Simplicity of the math vs mystery of behavior

Repeated claim: at the micro-level it’s just additions, multiplications, matrix multiplies, activation functions, gradients.
The real puzzle is why these simple components, scaled up, work so well and exhibit emergent abilities; interpretability remains difficult.

How much math matters in practice

Some say most AI progress and LLM research is driven by scaling, data, engineering, and trial-and-error rather than deep new math.
Others insist solid math is crucial for serious research and for understanding architecture trade‑offs, even if most practitioners rely on libraries.
One thread criticizes focusing beginners on low-level math as a derailment; another counters that knowing LLMs are “just linear algebra” prevents magical thinking.

Uncertainty, logits, and chaining models

Interesting aside: viewing LLMs as logit (distribution) emitters highlights cumulative uncertainty when chaining multiple LLM calls or agents.
Reports of multi-step pipelines “collapsing” after a few hops motivate human-in-the-loop workflows or single-orchestrator designs.

Learning resources and backgrounds

Many recommendations: Karpathy’s videos, “from scratch” LLM books, deep learning texts, and structured math/ML courses.
Several people with physics/control-theory backgrounds note their old linear algebra and calculus training suddenly became directly useful for understanding LLMs.

Meta and title criticism

Discussion about HN’s cultural bias toward “math for AI” vs hypothetical “leetcode for AI.”
Some readers find the title misleading: the article explains the math used inside LLMs, but not the still‑developing mathematics that would explain why LLMs work in a rigorous, interpretable way.

Related topics