2024-12-31

Coconut by Meta AI – Better LLM Reasoning with Chain of Continuous Thought?

Openness, Licensing, and Meta’s Motives

Several commenters praise Meta for releasing strong models/weights and enabling startups; others argue this is strategic, not altruistic, and emphasize that “open weights” ≠ “open source.”
Debate over whether this commoditizes LLMs and empowers small companies, or ultimately reinforces big‑tech control and lock‑in (e.g., via licensing, moderation norms, embeddings ecosystems).

Core Technical Idea: Coconut / Continuous Thought

Coconut replaces many explicit chain‑of‑thought (CoT) text steps with “latent thoughts” in the model’s continuous representation space.
Training: start with standard CoT data (question → reasoning steps → answer), then progressively replace reasoning steps with latent thought iterations bracketed by special tokens.
In “thinking” mode, the model repeatedly feeds its own last hidden state back as input, extracting more structure from context before emitting text.
Fixed numbers of latent steps performed comparably to a classifier that decides when to stop, so the paper largely uses a constant-length thought phase.

Reasoning Quality, Search, and Limits

Some see this as a potential “it” moment: closer to non‑token human‑like thinking, more expressive than language, cheaper than CoT, and akin to breadth‑first search over solution space.
Others say it’s just compute-heavy search compensating for lack of true understanding, and point to planning benchmarks (e.g., randomized blocksworld) where LLMs still fail badly.
Discussion of error compounding in long reasoning chains and how BFS‑like parallel exploration might reduce failure rates at high compute cost.

Latent Space, Language, and Alignment

Enthusiasm for LLMs (or multiple agents) communicating directly in embeddings as a “richer language,” possibly diverging from human language.
Counter‑concern: such non‑linguistic internal communication undermines interpretability and makes detecting deception or “scheming” harder; CoT text, however imperfect, is a key evaluation tool.
Some suggest saving and analyzing hidden states as a partial answer, but overall opacity and alignment difficulties are highlighted.

Miscellaneous Thread Themes

Questions about how backprop works when only final answers are rewarded, not intermediate thoughts.
Side debates on group intelligence, democracy, “wisdom of crowds,” and whether ensembles of agents are actually smarter.
Skepticism about the linked explainer site itself (LLM-like writing, ads), with people preferring to read the arXiv paper directly.

Related topics