Coconut by Meta AI – Better LLM Reasoning with Chain of Continuous Thought?

Openness, Licensing, and Meta’s Motives

  • Several commenters praise Meta for releasing strong models/weights and enabling startups; others argue this is strategic, not altruistic, and emphasize that “open weights” ≠ “open source.”
  • Debate over whether this commoditizes LLMs and empowers small companies, or ultimately reinforces big‑tech control and lock‑in (e.g., via licensing, moderation norms, embeddings ecosystems).

Core Technical Idea: Coconut / Continuous Thought

  • Coconut replaces many explicit chain‑of‑thought (CoT) text steps with “latent thoughts” in the model’s continuous representation space.
  • Training: start with standard CoT data (question → reasoning steps → answer), then progressively replace reasoning steps with latent thought iterations bracketed by special tokens.
  • In “thinking” mode, the model repeatedly feeds its own last hidden state back as input, extracting more structure from context before emitting text.
  • Fixed numbers of latent steps performed comparably to a classifier that decides when to stop, so the paper largely uses a constant-length thought phase.

Reasoning Quality, Search, and Limits

  • Some see this as a potential “it” moment: closer to non‑token human‑like thinking, more expressive than language, cheaper than CoT, and akin to breadth‑first search over solution space.
  • Others say it’s just compute-heavy search compensating for lack of true understanding, and point to planning benchmarks (e.g., randomized blocksworld) where LLMs still fail badly.
  • Discussion of error compounding in long reasoning chains and how BFS‑like parallel exploration might reduce failure rates at high compute cost.

Latent Space, Language, and Alignment

  • Enthusiasm for LLMs (or multiple agents) communicating directly in embeddings as a “richer language,” possibly diverging from human language.
  • Counter‑concern: such non‑linguistic internal communication undermines interpretability and makes detecting deception or “scheming” harder; CoT text, however imperfect, is a key evaluation tool.
  • Some suggest saving and analyzing hidden states as a partial answer, but overall opacity and alignment difficulties are highlighted.

Miscellaneous Thread Themes

  • Questions about how backprop works when only final answers are rewarded, not intermediate thoughts.
  • Side debates on group intelligence, democracy, “wisdom of crowds,” and whether ensembles of agents are actually smarter.
  • Skepticism about the linked explainer site itself (LLM-like writing, ads), with people preferring to read the arXiv paper directly.