2024-12-10

Training LLMs to Reason in a Continuous Latent Space

Continuous latent reasoning vs token-space CoT

Main idea: instead of generating explicit chain-of-thought (CoT) tokens, reuse the model’s internal “continuous thought” (last hidden state) across steps, so reasoning happens in latent space.
This avoids “snapping” the hidden state to discrete tokens via the LM head, which is likened to lossy quantization.
Some see this as “higher-resolution” internal reasoning and potentially more token-efficient than CoT; others note current gains appear mainly in efficiency, not dramatic quality jumps.

Last hidden state / embeddings

Clarified as the final hidden representation after all residual layers, just before the classification head/softmax.
Normally, this is projected to logits over the vocabulary; that projection compresses rich internal information into a single token distribution.
Here, the last hidden state is fed back as if it were a token, evolving a continuous state instead of feeding back discrete tokens.
There is debate over how “rich” this final vector really is; some say it is highly informative and already used in practice for classification/regression, others argue it’s optimized only for the next-token prediction and might be more compressed than assumed.

Training and architectural concerns

This recurrent use of hidden states introduces sequential dependencies during training, reducing the usual transformer advantage of parallelizable, masked-token training.
That makes training slower and more complex; whether the tradeoff is worth it is seen as task-dependent.
Some worry about feeding a last-layer representation back into the first-layer space, since layers might encode different representations; others point out prior work (e.g., feedback mechanisms, tied embeddings, layer reordering) suggests this can work with sufficient finetuning.

Relation to human thought and language

Several comments link this to humans “thinking in mentalese” or non-verbal modalities, with language as a lossy encoding of deeper concepts.
Others emphasize that language still heavily shapes thought and that internal representations may be multi-modal and variable across individuals.

Safety, intelligence, and terminology

Some see this as another step toward “true intelligence” and raise concerns about uncontrolled progress and potential danger.
Others push back on anthropomorphic language (“thinking”, “reasoning”), arguing LLMs are sophisticated next-token predictors/compressors, not intelligent agents.
There is a broader debate over whether brains are “just” computers in the Turing sense and whether that makes AGI largely a scaling/training problem.

Related topics