QwQ: Alibaba's O1-like reasoning LLM
Model capabilities and reasoning behavior
- Many commenters find QwQ’s math and coding performance impressive, often near GPT‑4 / o1 for targeted tasks (e.g., AIME-style problems, topology, subadditive sequences, reverse engineering).
- The model does long chain‑of‑thought style reasoning; it frequently backtracks, critiques its own steps, and eventually corrects mistakes, but can be extremely verbose and slow.
- On classic puzzles (strawberry “r” count, Sally’s siblings, river-crossing variants), it can reach correct answers but often after 100+ lines of meandering reasoning, including obvious miscounts and contradictions.
- Some see this as “modeled OCD” or overthinking; others view it as promising persistence and self‑correction, like a not‑very‑bright but very diligent intern.
- It still fails basic questions (e.g., “How many words are in your response?”) and simple physical reasoning (rock in a glass of water) in ways older models sometimes don’t.
Censorship, safety filters, and bias
- QwQ refuses or heavily sanitizes many topics: Chinese politics (Xi, Tiananmen), some historical events, crime by ethnicity, and sometimes Western flashpoints (George Floyd) depending on phrasing.
- The filters are inconsistent and can be circumvented via rephrasing, output suffix hacks, or indirect prompts; sometimes the model drifts into Chinese mid‑answer and back.
- Some participants compare this to Western LLM guardrails, arguing Chinese political censorship is broader and more state‑driven; others note US models also embed strong ideological constraints, just on different topics.
- Concern is raised that open Chinese models may carry “ideological backdoors” (historical denial, regime narratives), making them unsuitable for some products despite strong benchmarks.
Hardware, training, and sanctions
- Speculation that QwQ was trained on Nvidia China‑specific SKUs (H20, H800, etc.), older A100/H100 stock, or overseas data centers; others note Chinese firms can rent Western cloud GPUs.
- Discussion that consumer GPUs and Apple Silicon can train small models but interconnect limits make large‑scale training far less efficient than datacenter GPUs.
- Some argue US export controls are porous (e.g., Singapore intermediaries, cloud access) and won’t prevent Chinese AI progress.
Open weights, competition, and geopolitics
- QwQ’s open weights, detailed training notes, and visible reasoning are praised, especially compared to closed models like o1.
- Several see a strategic pattern: Chinese (and some Western) labs commoditizing foundation models via open releases to erode moats of proprietary US startups.
- Debate over whether OpenAI still has a moat beyond brand; some think branding is powerful, others doubt the business model if open models keep catching up.
- Some predict Western governments may eventually restrict Chinese LLMs on security grounds; others think enforcement will be limited, especially for local use.
Local usage and performance
- QwQ‑32B runs locally via Ollama, LM Studio, MLX, etc.; Q4 quant fits in ~20–25 GB, making it usable on 24 GB Nvidia cards and 32–64 GB Apple Silicon Macs.
- Reported speeds are ~8–25 tokens/s on modern Macs and consumer GPUs—“fast enough to read,” but long CoT makes interactive use feel slow.
- Users note good results on integrals, physics, and coding explanations, but also tool‑use quirks (e.g., XML tasks) and occasional refusal to answer code questions.