2024-09-17

Chain of Thought empowers transformers to solve inherently serial problems

Paper’s Core Claim and Theoretical Context

Paper claims transformers with chain-of-thought (CoT) and constant depth can, in principle, solve any formally decidable problem if allowed arbitrarily many intermediate tokens.
Several comments place this in circuit complexity terms: earlier work bounded vanilla transformers (finite precision) to small circuit classes (e.g., TC⁰/AC⁰), unable to express some simple functions like parity; CoT and/or different assumptions expand their computational power toward general poly-time computation.
Others liken this to universal approximation / Turing-completeness results for neural networks and argue it’s theoretically interesting but practically weak (like “any function can be approximated with enough neurons”).

Enthusiasm and Potential Implications

Some see this as strong evidence that scaling inference and CoT could, in principle, let transformers match or exceed human problem-solving, assuming enough compute and good algorithm discovery.
CoT is compared to writing and multi-generational human reasoning; also to adding memory (like going from finite-state to pushdown/Turing machines).

Skepticism and Practical Limitations

Many push back on the marketing phrase “solve any problem” as ignoring efficiency, thermodynamics, and physical resource limits (“not enough planet / GPUs”).
Point that representing any algorithm ≠ finding good algorithms; the hard part is discovery and learning, not raw expressivity.
The result is compared to infinite-monkey / Busy Beaver–style theorems: true but not operationally useful.
Some stress that transformers’ power under realistic finite-precision and fixed-depth constraints remains limited; CoT helps, but may require polynomially many tokens, which is likely prohibitive.

Chain-of-Thought, Depth, and Seriality

CoT is framed as adding loops or increasing effective depth over time, converting a constant-depth circuit into a sequential process.
Discussion around “inherently serial” problems: some computations parallelize (e.g., summation); others may fundamentally require serial steps. CoT lets a parallel-ish architecture emulate serial reasoning.

Formal vs Informal Problems

The theorem applies to formal languages/decidable problems only.
Several comments note that most real-world tasks start as fuzzy, informal human requests; there’s a hard, poorly understood problem of formalizing these into something a model (or any formal system) can solve.
Whether this translation itself can be made formal or is inherently informal is debated and left unresolved.

“Stochastic Parrot” and Intelligence Debate

Tangential but extensive discussion compares LLMs to “stochastic parrots,” Chinese Room, and humans as pattern-matching, partially-understanding agents.
One side uses these labels to demystify LLMs as sophisticated autocomplete; another argues humans are not fundamentally different and that such dismissive labels obscure genuine advances.

Methods, Tools, and Variants

Iterative “for-loop of thought” prompting (self-critique cycles) is discussed; anecdotal reports show mixed results, and systematic benchmarking is labeled as unclear.
Comments mention related work: CoT boosting transformer power vs RNNs, transformers vs classical symbolic methods (Prolog, expert systems), and hybrid ideas combining probabilistic LLMs with explicit logical reasoning.

Related topics