DeepThought-8B: A small, capable reasoning model

Model Capabilities & Behavior

  • Users test letter-counting tasks (“strawberry” variants); model often explains its steps but still miscounts, suggesting methodical-looking but unreliable “reasoning.”
  • It handles some conceptual logic questions well (e.g., mass/weight comparison like “2 kg feathers vs 1 kg lead”), which small models often fail, though some consider this a weak reasoning test.
  • On more technical prompts (thermodynamics, entropy, reversible vs irreversible processes), it gives long, seemingly textbook-style explanations that are judged “expected” pattern-matching rather than deep insight.
  • For some math/number-theory tasks (e.g., “two primes summing to 123”), the model can loop for minutes without converging, while other models quickly produce or reject answers.

“Reasoning Model” vs. Plain LLM

  • Several commenters argue “reasoning model” is mostly a marketing label for techniques like beam search or test-time compute, not fundamentally new capabilities.
  • Others stress that whether reasoning is “baked in” or implemented via wrappers, it is still just tuned next-token prediction.
  • There is debate over what counts as reasoning vs. probabilistic search, with references to classical AI search, logic, and theoretical limits of transformers; consensus leans toward “no true reasoning,” but useful approximations.

Benchmarks, Claims & Evaluation

  • The announcement graph is widely criticized: low-contrast bars, hard-to-read labels, and ambiguous comparisons (“Model A–D” without naming baselines).
  • Some suspect cherry-picking or “grifty” benchmarking, especially when an 8B model is shown outperforming a 13B baseline with no details on training tokens or model identity.

Interface, Performance & Availability

  • Many report the web demo is slow, freezes, or never returns outputs on non-trivial prompts.
  • Visual design of the site (dense text, animations, inaccessible color choices) is criticized.
  • The model appears only as a hosted chat with optional API via sales; no downloadable weights, no Hugging Face/Ollama entry, which clashes with the “self-sovereign” branding.

Licensing, Openness & Legal Questions

  • Commenters note it is Llama-based and may violate Meta’s requirement to include “Llama” in derived model names.
  • Broader debate on what “open source” means for models: weights vs. code vs. training data, and whether model weights are copyrightable.
  • Lawsuits are anticipated as a way to clarify legality of training on scraped copyrighted data.