2024-11-27

DeepThought-8B: A small, capable reasoning model

Model Capabilities & Behavior

Users test letter-counting tasks (“strawberry” variants); model often explains its steps but still miscounts, suggesting methodical-looking but unreliable “reasoning.”
It handles some conceptual logic questions well (e.g., mass/weight comparison like “2 kg feathers vs 1 kg lead”), which small models often fail, though some consider this a weak reasoning test.
On more technical prompts (thermodynamics, entropy, reversible vs irreversible processes), it gives long, seemingly textbook-style explanations that are judged “expected” pattern-matching rather than deep insight.
For some math/number-theory tasks (e.g., “two primes summing to 123”), the model can loop for minutes without converging, while other models quickly produce or reject answers.

“Reasoning Model” vs. Plain LLM

Several commenters argue “reasoning model” is mostly a marketing label for techniques like beam search or test-time compute, not fundamentally new capabilities.
Others stress that whether reasoning is “baked in” or implemented via wrappers, it is still just tuned next-token prediction.
There is debate over what counts as reasoning vs. probabilistic search, with references to classical AI search, logic, and theoretical limits of transformers; consensus leans toward “no true reasoning,” but useful approximations.

Benchmarks, Claims & Evaluation

The announcement graph is widely criticized: low-contrast bars, hard-to-read labels, and ambiguous comparisons (“Model A–D” without naming baselines).
Some suspect cherry-picking or “grifty” benchmarking, especially when an 8B model is shown outperforming a 13B baseline with no details on training tokens or model identity.

Interface, Performance & Availability

Many report the web demo is slow, freezes, or never returns outputs on non-trivial prompts.
Visual design of the site (dense text, animations, inaccessible color choices) is criticized.
The model appears only as a hosted chat with optional API via sales; no downloadable weights, no Hugging Face/Ollama entry, which clashes with the “self-sovereign” branding.

Licensing, Openness & Legal Questions

Commenters note it is Llama-based and may violate Meta’s requirement to include “Llama” in derived model names.
Broader debate on what “open source” means for models: weights vs. code vs. training data, and whether model weights are copyrightable.
Lawsuits are anticipated as a way to clarify legality of training on scraped copyrighted data.

Related topics