2024-07-07

Reasoning in Large Language Models: A Geometric Perspective

Geometric view of LLMs / paper takeaways

Neural nets (incl. transformers) can be seen geometrically: non-linear layers partition input space into many regions, each with its own affine mapping.
The number of such regions grows exponentially with the intrinsic dimension of the input, increasing approximation power without adding neurons.
In transformers, self‑attention outputs feed MLPs; denser attention graphs correlate with higher intrinsic dimension and better performance on math word problems.
Adding context tokens can raise intrinsic dimension, but only increases reasoning performance when the final layer’s intrinsic dimension rises, not just the first layer’s.

Debate: does geometry explain “reasoning”?

Supporters see this as a useful, concrete link between network geometry, expressivity, and observed reasoning-like behavior.
Skeptics argue that relating “geometry” and “reasoning” is conceptually loose unless clear, specific implications are shown.

Reasoning vs pattern-matching

One side: LLMs are sophisticated autocomplete over token embeddings; internal concepts are geometric regions; some level of reasoning naturally emerges from compressing and combining those concepts.
Other side: models mainly reflect patterns in text, lack robust multi-step planning or scalable math, and fail sharply as problems grow; this is seen as “reasoning-like” but not genuine reasoning.

Capabilities, limitations, and math

Examples discussed where models do small multiplications or logic, but break down on larger or less-seen instances.
Some argue this shows pure language modeling is insufficient for unbounded math or algorithmic reasoning; others note that chain-of-thought, tools (e.g., code), and internal optimization dynamics blur this line.

Training data, generalization, and contamination

A long critique stresses we don’t really know training corpora; benchmarks may be contaminated with seen or semantically similar data.
This makes it hard to separate true generalization/reasoning from memorization or paraphrasing, and casts doubt on strong claims about reasoning.

What is “reasoning”?

Recurrent theme: “reasoning” is ill-defined.
Some equate it with any learned logical/causal mapping (which DNNs can approximate); others require properties like robust abstraction, self-knowledge, or embodiment, which current LLMs lack.
Several suggest treating reasoning as a spectrum rather than a binary capability.

Related topics