Training LLMs to Reason in a Continuous Latent Space
Continuous latent reasoning vs token-space CoT
- Main idea: instead of generating explicit chain-of-thought (CoT) tokens, reuse the model’s internal “continuous thought” (last hidden state) across steps, so reasoning happens in latent space.
- This avoids “snapping” the hidden state to discrete tokens via the LM head, which is likened to lossy quantization.
- Some see this as “higher-resolution” internal reasoning and potentially more token-efficient than CoT; others note current gains appear mainly in efficiency, not dramatic quality jumps.
Last hidden state / embeddings
- Clarified as the final hidden representation after all residual layers, just before the classification head/softmax.
- Normally, this is projected to logits over the vocabulary; that projection compresses rich internal information into a single token distribution.
- Here, the last hidden state is fed back as if it were a token, evolving a continuous state instead of feeding back discrete tokens.
- There is debate over how “rich” this final vector really is; some say it is highly informative and already used in practice for classification/regression, others argue it’s optimized only for the next-token prediction and might be more compressed than assumed.
Training and architectural concerns
- This recurrent use of hidden states introduces sequential dependencies during training, reducing the usual transformer advantage of parallelizable, masked-token training.
- That makes training slower and more complex; whether the tradeoff is worth it is seen as task-dependent.
- Some worry about feeding a last-layer representation back into the first-layer space, since layers might encode different representations; others point out prior work (e.g., feedback mechanisms, tied embeddings, layer reordering) suggests this can work with sufficient finetuning.
Relation to human thought and language
- Several comments link this to humans “thinking in mentalese” or non-verbal modalities, with language as a lossy encoding of deeper concepts.
- Others emphasize that language still heavily shapes thought and that internal representations may be multi-modal and variable across individuals.
Safety, intelligence, and terminology
- Some see this as another step toward “true intelligence” and raise concerns about uncontrolled progress and potential danger.
- Others push back on anthropomorphic language (“thinking”, “reasoning”), arguing LLMs are sophisticated next-token predictors/compressors, not intelligent agents.
- There is a broader debate over whether brains are “just” computers in the Turing sense and whether that makes AGI largely a scaling/training problem.