The maths you need to start understanding LLMs
Embeddings, RAG, and scope of the article
- Several comments note the article’s math is essentially what you need for embeddings and RAG: turn text into vectors, use cosine distance to find relevant chunks, optionally rerank.
- Others point out this is only the input stage; it doesn’t cover the full transformer/LLM, which has trillions of parameters and far more complexity.
What math you “need”
- Common list: basic linear algebra, basic probability, some analysis (exp/softmax), gradients.
- Some argue this is enough to start understanding LLMs (“necessary but not sufficient”), but not to fully understand training, optimization, or architecture design.
- A few mention missing pieces like vector calculus, Hessians, and optimization theory.
Does doing the math equal understanding?
- Debate over whether being able to write formulas or code PyTorch implies real understanding.
- One view: formula use is the first step; deeper understanding comes from abstractions and analogies, and is effectively unbounded.
- Others contrast ML with fields like elliptic-curve crypto, where derivations feel more “principled.”
Are LLMs just next-token predictors? World models vs parrots
- One camp leans on “next-token predictor / stochastic parrot” as a useful high-level explanation for non‑technical audiences.
- Another camp argues modern LLMs implicitly build internal models of the world and concepts, going beyond simple statistics.
- There is pushback: LLMs only see text, not direct interaction with the world, so whatever “world model” they have is indirect and impoverished.
- Some see “world model” claims as overblown, others see them as obvious given language models the world.
Simplicity of the math vs mystery of behavior
- Repeated claim: at the micro-level it’s just additions, multiplications, matrix multiplies, activation functions, gradients.
- The real puzzle is why these simple components, scaled up, work so well and exhibit emergent abilities; interpretability remains difficult.
How much math matters in practice
- Some say most AI progress and LLM research is driven by scaling, data, engineering, and trial-and-error rather than deep new math.
- Others insist solid math is crucial for serious research and for understanding architecture trade‑offs, even if most practitioners rely on libraries.
- One thread criticizes focusing beginners on low-level math as a derailment; another counters that knowing LLMs are “just linear algebra” prevents magical thinking.
Uncertainty, logits, and chaining models
- Interesting aside: viewing LLMs as logit (distribution) emitters highlights cumulative uncertainty when chaining multiple LLM calls or agents.
- Reports of multi-step pipelines “collapsing” after a few hops motivate human-in-the-loop workflows or single-orchestrator designs.
Learning resources and backgrounds
- Many recommendations: Karpathy’s videos, “from scratch” LLM books, deep learning texts, and structured math/ML courses.
- Several people with physics/control-theory backgrounds note their old linear algebra and calculus training suddenly became directly useful for understanding LLMs.
Meta and title criticism
- Discussion about HN’s cultural bias toward “math for AI” vs hypothetical “leetcode for AI.”
- Some readers find the title misleading: the article explains the math used inside LLMs, but not the still‑developing mathematics that would explain why LLMs work in a rigorous, interpretable way.