2024-11-27

QwQ: Alibaba's O1-like reasoning LLM

Model capabilities and reasoning behavior

Many commenters find QwQ’s math and coding performance impressive, often near GPT‑4 / o1 for targeted tasks (e.g., AIME-style problems, topology, subadditive sequences, reverse engineering).
The model does long chain‑of‑thought style reasoning; it frequently backtracks, critiques its own steps, and eventually corrects mistakes, but can be extremely verbose and slow.
On classic puzzles (strawberry “r” count, Sally’s siblings, river-crossing variants), it can reach correct answers but often after 100+ lines of meandering reasoning, including obvious miscounts and contradictions.
Some see this as “modeled OCD” or overthinking; others view it as promising persistence and self‑correction, like a not‑very‑bright but very diligent intern.
It still fails basic questions (e.g., “How many words are in your response?”) and simple physical reasoning (rock in a glass of water) in ways older models sometimes don’t.

Censorship, safety filters, and bias

QwQ refuses or heavily sanitizes many topics: Chinese politics (Xi, Tiananmen), some historical events, crime by ethnicity, and sometimes Western flashpoints (George Floyd) depending on phrasing.
The filters are inconsistent and can be circumvented via rephrasing, output suffix hacks, or indirect prompts; sometimes the model drifts into Chinese mid‑answer and back.
Some participants compare this to Western LLM guardrails, arguing Chinese political censorship is broader and more state‑driven; others note US models also embed strong ideological constraints, just on different topics.
Concern is raised that open Chinese models may carry “ideological backdoors” (historical denial, regime narratives), making them unsuitable for some products despite strong benchmarks.

Hardware, training, and sanctions

Speculation that QwQ was trained on Nvidia China‑specific SKUs (H20, H800, etc.), older A100/H100 stock, or overseas data centers; others note Chinese firms can rent Western cloud GPUs.
Discussion that consumer GPUs and Apple Silicon can train small models but interconnect limits make large‑scale training far less efficient than datacenter GPUs.
Some argue US export controls are porous (e.g., Singapore intermediaries, cloud access) and won’t prevent Chinese AI progress.

Open weights, competition, and geopolitics

QwQ’s open weights, detailed training notes, and visible reasoning are praised, especially compared to closed models like o1.
Several see a strategic pattern: Chinese (and some Western) labs commoditizing foundation models via open releases to erode moats of proprietary US startups.
Debate over whether OpenAI still has a moat beyond brand; some think branding is powerful, others doubt the business model if open models keep catching up.
Some predict Western governments may eventually restrict Chinese LLMs on security grounds; others think enforcement will be limited, especially for local use.

Local usage and performance

QwQ‑32B runs locally via Ollama, LM Studio, MLX, etc.; Q4 quant fits in ~20–25 GB, making it usable on 24 GB Nvidia cards and 32–64 GB Apple Silicon Macs.
Reported speeds are ~8–25 tokens/s on modern Macs and consumer GPUs—“fast enough to read,” but long CoT makes interactive use feel slow.
Users note good results on integrals, physics, and coding explanations, but also tool‑use quirks (e.g., XML tasks) and occasional refusal to answer code questions.

Related topics