2024-09-02

Inductive or deductive? Rethinking the fundamental reasoning abilities of LLMs

Scope of “reasoning” in LLMs

Strong disagreement over whether LLMs “reason” at all vs. being advanced statistical text predictors.
Some argue they only map inputs to likely outputs from training data, with no goals, self-model, or awareness of questions/answers.
Others say their behavior is best seen as approximating reasoning (or even as a kind of reasoning), just limited, brittle, and unlike human cognition.
Several note that debates often reduce to differing definitions of “reason,” “intelligence,” and “consciousness.”

Deductive, inductive, abductive reasoning

Multiple commenters note the paper’s focus on inductive vs. deductive is incomplete without abduction (inference to best explanation).
One view: LLM behavior seems closer to abductive/Bayesian inference over token sequences than to clean symbolic deduction.
Others say in practice the distinctions blur for LLMs, since they only ever see text, not real-world events.

Tokenization, “strawberry,” and failure modes

The “How many ‘r’s in ‘strawberry’?” example is heavily discussed.
Some see this as proof LLMs don’t understand letters/words, only tokens and distributions.
Others argue the failure is a tokenization artifact; character-level models can count letters reliably, and prompts that change tokenization can fix it.
Debate over whether such failures show “no reasoning” or simply current architectural/engineering limits.

Memorization vs. generalization

Repeated concern: you cannot cleanly test reasoning without knowing what’s in the training set.
Arithmetic in common bases, Caesar ciphers, and many “reasoning” benchmarks are likely in-distribution, so high scores may be memorization or pattern reuse.
Some see base-dependent arithmetic performance as evidence of memorization rather than abstract rule learning.
Others note that humans also rely heavily on learned patterns; the line between memorization and reasoning is fuzzy.

Consciousness and qualia

Long subthread on whether LLMs are conscious or “aware” of anything, or merely manipulating symbols.
Competing views: consciousness as graded world-modelling vs. requiring specific physical substrates (e.g., brain waves).
No consensus; several stress that invoking qualia or “soul-like” properties does not help evaluate present systems.

Capabilities and current limits

Commenters highlight LLM strengths in pattern mapping and language fluency but weaknesses in strict rule-following, robust math, logic puzzles, text-to-SQL, and ASCII art.
Some argue these weaknesses show transformers are poor architectures for genuine reasoning (search + program execution); others see room for incremental improvement and hybrid systems.

Related topics