Language models pack billions of concepts into 12k dimensions
Orthogonality, binary vectors, and quasi-orthogonality
- Thread debates “orthogonal” binary vectors: strict orthogonality via dot product vs “no shared 1-bits” vs XOR over GF(2).
- Several people note you can’t have more than n mutually orthogonal vectors in n dimensions, but you can have many quasi-orthogonal bitstrings (small overlaps).
- One proposal: use long sparse bit vectors (e.g. 1000 bits with 10 ones per concept) so many concepts can co-exist in a single vector with low overlap, akin to coding theory / spherical codes.
JL lemma, superposition, and sparse autoencoders
- Commenters connect the Johnson–Lindenstrauss (JL) lemma and “near-orthogonality” to the superposition hypothesis and Sparse Autoencoders (SAEs) in mechanistic interpretability.
- SAEs try to recover sparse, nearly-orthogonal “features” from dense activations; this matches the idea of many quasi-orthogonal concepts in a high‑dimensional space.
Capacity of high-dimensional spaces and ‘number of concepts’
- Some intuitions are combinatorial (2^k, 3^k, factorial counts), but others push back that this confuses “possible vectors” with meaningful “concepts.”
- One camp thinks 1k–20k dimensions is more than enough for human‑scale knowledge; another says the article overestimates capacity because what matters is preserving relative distances and rankings, not just almost-orthogonality.
- A separate critique calls the “10^200 concepts in 12k dimensions” claim absurd in information-theoretic terms and conflating geometry with Shannon capacity.
Topological vs metric preservation and folding
- A long subthread distinguishes JL’s guarantees for finite point sets from embedding the entire underlying manifold (Takens/Whitney/Sauer–Yorke).
- Argument: with fixed dimension k, refining resolution inevitably causes “folding” — distant regions of the true manifold map close together, potentially explaining some LLM pathologies.
- Others ask for concrete empirical examples and suggest this may be a theoretical rather than dominant practical issue.
How LLMs actually store concepts
- Multiple comments stress that models don’t assign one dimension per concept or enforce orthogonality; “understanding” emerges from the whole network, non-linearities, and attention, not just raw embedding geometry.
- KV cache and many layers massively expand effective representational space beyond a single 12k‑dim vector.
- Some note that non-linearities (e.g. softmax, GeLU) and normalization mean vectors need not be orthogonal; you can disambiguate many items even in low dimensions.
Peer review, blog papers, and AI-written style
- Long debate on blog-style mechanistic interpretability work: high impact and widely cited vs “sloppy,” analogy-heavy, and lacking formal peer review.
- Several argue ML conference peer review is currently dysfunctional; others say formal review would still force clearer definitions and less hand‑wavy claims.
- Distinct subthread complains the article’s tone feels like LLM-generated “AI slop”: overuse of superlatives, formulaic structure, and internal inconsistencies (e.g., misinterpreted constants, spherical-code-like arguments).
- Counterpoint: using an LLM for wording doesn’t invalidate the underlying math or experiments, though it can mask errors and erode trust.
Semantics vs syntax in LLMs
- One view: LLMs don’t contain “real-world concepts,” only syntactic token relationships; any semantics live in human interpretation.
- Others counter that models handle homonyms and category judgments in ways that align with semantic distinctions, and that syntax-only pattern matching is too weak an explanation.
- No consensus: some insist “reasoning” talk is overclaim; others see emergent semantic structure in embeddings and behavior.
Miscellaneous points and open questions
- Questions about what actually enforces (near-)orthogonality during training go unanswered; it’s implied to be an emergent consequence of loss, architecture, and normalization.
- Some argue there aren’t “billions of human concepts” in the strict philosophical sense, so capacity claims may be solving the wrong problem.
- A late comment notes tension between this theory-heavy “huge capacity” narrative and empirical work finding limited semantic capacity for some embedding uses; the reconciliation is left unclear.