DeepThought-8B: A small, capable reasoning model
Model Capabilities & Behavior
- Users test letter-counting tasks (“strawberry” variants); model often explains its steps but still miscounts, suggesting methodical-looking but unreliable “reasoning.”
- It handles some conceptual logic questions well (e.g., mass/weight comparison like “2 kg feathers vs 1 kg lead”), which small models often fail, though some consider this a weak reasoning test.
- On more technical prompts (thermodynamics, entropy, reversible vs irreversible processes), it gives long, seemingly textbook-style explanations that are judged “expected” pattern-matching rather than deep insight.
- For some math/number-theory tasks (e.g., “two primes summing to 123”), the model can loop for minutes without converging, while other models quickly produce or reject answers.
“Reasoning Model” vs. Plain LLM
- Several commenters argue “reasoning model” is mostly a marketing label for techniques like beam search or test-time compute, not fundamentally new capabilities.
- Others stress that whether reasoning is “baked in” or implemented via wrappers, it is still just tuned next-token prediction.
- There is debate over what counts as reasoning vs. probabilistic search, with references to classical AI search, logic, and theoretical limits of transformers; consensus leans toward “no true reasoning,” but useful approximations.
Benchmarks, Claims & Evaluation
- The announcement graph is widely criticized: low-contrast bars, hard-to-read labels, and ambiguous comparisons (“Model A–D” without naming baselines).
- Some suspect cherry-picking or “grifty” benchmarking, especially when an 8B model is shown outperforming a 13B baseline with no details on training tokens or model identity.
Interface, Performance & Availability
- Many report the web demo is slow, freezes, or never returns outputs on non-trivial prompts.
- Visual design of the site (dense text, animations, inaccessible color choices) is criticized.
- The model appears only as a hosted chat with optional API via sales; no downloadable weights, no Hugging Face/Ollama entry, which clashes with the “self-sovereign” branding.
Licensing, Openness & Legal Questions
- Commenters note it is Llama-based and may violate Meta’s requirement to include “Llama” in derived model names.
- Broader debate on what “open source” means for models: weights vs. code vs. training data, and whether model weights are copyrightable.
- Lawsuits are anticipated as a way to clarify legality of training on scraped copyrighted data.