Something is afoot in the land of Qwen
Qwen team shake-up and project direction
- Multiple commenters report that key Qwen researchers have left after internal tensions with the parent company, possibly over KPIs (e.g., DAU for the Qwen app) and product vs. research priorities.
- Some speculate about demotion, power struggles, or a shift toward closed, proprietary models, but specific causes remain unclear.
- Many see this as a major loss for the open/local LLM ecosystem, given Qwen’s recent progress.
Model capabilities and comparisons
- Qwen3.5 models, especially 35B-A3B and 27B, are widely praised as state-of-the-art among local/open weights, with strong coding, planning, and tool use for their size.
- Experiences vary: some find Qwen3.5-35B-A3B better than Qwen3-Coder-Next, others the reverse, often attributing differences to model size, quantization quality, chat template, and serving stack.
- Compared to frontier cloud models (Claude, Gemini, etc.), Qwen is still seen as roughly “a year behind,” but impressively close for self-hosted use.
Agentic coding, harnesses, and behavior
- Harness / orchestrator quality (e.g., Zed’s agentic features, Pi-style minimal setups, Qwen’s own harness, Antigravity, OpenCode) strongly affects outcomes.
- Tools often go unused unless the system prompt clearly defines them; explicit tool descriptions and formats significantly improve behavior.
- Users report both tenacious problem-solving and frustrating looping, shortcutting, or ignoring instructions mid-task. Lower temperature helps but temperature=0 can be counterproductive.
Performance, hardware, and quantization
- People share practical setups: consumer GPUs (3070 Ti, 5080, AMD AI Max), Macs, large-RAM CPUs, and various 4–6 bit quants via llama.cpp / vLLM.
- Token speeds in the ~20–70 tok/s range are common; context length and quant choice heavily impact tool-calling reliability and looping.
- Small Qwen3.5 models (0.8–9B) are noted as surprisingly capable for OCR and vision, but weaker for complex coding and coherent prose.
Geopolitics, talent, and economics
- Long subthreads debate why top Chinese researchers might stay in or return to China vs. joining US or EU labs, citing nationalism, quality of life, immigration enforcement, and government expectations.
- Some view Chinese open releases (Qwen, GLM, Kimi) as strategically subsidized to pressure US proprietary vendors.
- Business sustainability is questioned: training is costly, while models are released for free; suggested motives include VC funding, hosted inference revenue, and national strategic goals.
Vendor conflicts and “distillation”
- Discussion of Anthropic’s complaints centers on use of Claude as “LLM-as-a-judge” and for generating training data.
- Commenters argue this is more like RL with model-based feedback than true weight-level distillation, and note that similar cross-model training behavior is widespread.