Chain of Thought empowers transformers to solve inherently serial problems

Paper’s Core Claim and Theoretical Context

  • Paper claims transformers with chain-of-thought (CoT) and constant depth can, in principle, solve any formally decidable problem if allowed arbitrarily many intermediate tokens.
  • Several comments place this in circuit complexity terms: earlier work bounded vanilla transformers (finite precision) to small circuit classes (e.g., TC⁰/AC⁰), unable to express some simple functions like parity; CoT and/or different assumptions expand their computational power toward general poly-time computation.
  • Others liken this to universal approximation / Turing-completeness results for neural networks and argue it’s theoretically interesting but practically weak (like “any function can be approximated with enough neurons”).

Enthusiasm and Potential Implications

  • Some see this as strong evidence that scaling inference and CoT could, in principle, let transformers match or exceed human problem-solving, assuming enough compute and good algorithm discovery.
  • CoT is compared to writing and multi-generational human reasoning; also to adding memory (like going from finite-state to pushdown/Turing machines).

Skepticism and Practical Limitations

  • Many push back on the marketing phrase “solve any problem” as ignoring efficiency, thermodynamics, and physical resource limits (“not enough planet / GPUs”).
  • Point that representing any algorithm ≠ finding good algorithms; the hard part is discovery and learning, not raw expressivity.
  • The result is compared to infinite-monkey / Busy Beaver–style theorems: true but not operationally useful.
  • Some stress that transformers’ power under realistic finite-precision and fixed-depth constraints remains limited; CoT helps, but may require polynomially many tokens, which is likely prohibitive.

Chain-of-Thought, Depth, and Seriality

  • CoT is framed as adding loops or increasing effective depth over time, converting a constant-depth circuit into a sequential process.
  • Discussion around “inherently serial” problems: some computations parallelize (e.g., summation); others may fundamentally require serial steps. CoT lets a parallel-ish architecture emulate serial reasoning.

Formal vs Informal Problems

  • The theorem applies to formal languages/decidable problems only.
  • Several comments note that most real-world tasks start as fuzzy, informal human requests; there’s a hard, poorly understood problem of formalizing these into something a model (or any formal system) can solve.
  • Whether this translation itself can be made formal or is inherently informal is debated and left unresolved.

“Stochastic Parrot” and Intelligence Debate

  • Tangential but extensive discussion compares LLMs to “stochastic parrots,” Chinese Room, and humans as pattern-matching, partially-understanding agents.
  • One side uses these labels to demystify LLMs as sophisticated autocomplete; another argues humans are not fundamentally different and that such dismissive labels obscure genuine advances.

Methods, Tools, and Variants

  • Iterative “for-loop of thought” prompting (self-critique cycles) is discussed; anecdotal reports show mixed results, and systematic benchmarking is labeled as unclear.
  • Comments mention related work: CoT boosting transformer power vs RNNs, transformers vs classical symbolic methods (Prolog, expert systems), and hybrid ideas combining probabilistic LLMs with explicit logical reasoning.