Reasoning in Large Language Models: A Geometric Perspective
Geometric view of LLMs / paper takeaways
- Neural nets (incl. transformers) can be seen geometrically: non-linear layers partition input space into many regions, each with its own affine mapping.
- The number of such regions grows exponentially with the intrinsic dimension of the input, increasing approximation power without adding neurons.
- In transformers, self‑attention outputs feed MLPs; denser attention graphs correlate with higher intrinsic dimension and better performance on math word problems.
- Adding context tokens can raise intrinsic dimension, but only increases reasoning performance when the final layer’s intrinsic dimension rises, not just the first layer’s.
Debate: does geometry explain “reasoning”?
- Supporters see this as a useful, concrete link between network geometry, expressivity, and observed reasoning-like behavior.
- Skeptics argue that relating “geometry” and “reasoning” is conceptually loose unless clear, specific implications are shown.
Reasoning vs pattern-matching
- One side: LLMs are sophisticated autocomplete over token embeddings; internal concepts are geometric regions; some level of reasoning naturally emerges from compressing and combining those concepts.
- Other side: models mainly reflect patterns in text, lack robust multi-step planning or scalable math, and fail sharply as problems grow; this is seen as “reasoning-like” but not genuine reasoning.
Capabilities, limitations, and math
- Examples discussed where models do small multiplications or logic, but break down on larger or less-seen instances.
- Some argue this shows pure language modeling is insufficient for unbounded math or algorithmic reasoning; others note that chain-of-thought, tools (e.g., code), and internal optimization dynamics blur this line.
Training data, generalization, and contamination
- A long critique stresses we don’t really know training corpora; benchmarks may be contaminated with seen or semantically similar data.
- This makes it hard to separate true generalization/reasoning from memorization or paraphrasing, and casts doubt on strong claims about reasoning.
What is “reasoning”?
- Recurrent theme: “reasoning” is ill-defined.
- Some equate it with any learned logical/causal mapping (which DNNs can approximate); others require properties like robust abstraction, self-knowledge, or embodiment, which current LLMs lack.
- Several suggest treating reasoning as a spectrum rather than a binary capability.