Qwen3.6-Max-Preview: Smarter, Sharper, Still Evolving
Model comparisons & benchmarks
- Many question Qwen3.6-Max benchmarking against Claude Opus 4.5 instead of newer 4.6/4.7, seeing vendor benchmarks as inherently cherry‑picked and calling for independent evals.
- Several report GLM 5.1 as “Sonnet–Opus level” for coding and tools, but slower and with reliability issues on some providers; others find it noticeably worse or loop‑prone.
- Kimi K2.6 is raised as a strong alternative: slightly better SWE-Bench/Terminal-Bench scores than Qwen3.6-Max and notably cheaper per token.
- Consensus: SOTA differences are now small and highly task‑dependent; people disagree which top model is “best,” and many feel we’ve hit “good enough” for a lot of dev work.
Cost, value, and rate limits
- Strong divide: some see $100–$200/month for top models as trivial vs dev time; others are highly cost‑sensitive and prefer GLM, Qwen, MiniMax or local models.
- Claude subscription limits (especially Opus) are widely criticized: users report hitting weekly limits in days or even hours, forcing workflow changes or cancellations.
Local and open-weight models
- Qwen 3.5/3.6, Gemma 4, and GLM 5.1 (open weights) are repeatedly cited as the best local options, with MoE variants (Qwen3.6 35B-A3B, Gemma 4 26B-a4b) balancing quality and VRAM.
- Max series (including Qwen3.6-Max-Preview) is cloud-only and proprietary; many see the real long‑term story in the open-weight Qwen series running on consumer hardware.
- Several describe concrete setups using llama.cpp + Qwen/Gemma on single GPUs (e.g., 4090) achieving usable speeds for coding.
Chinese vs US labs & openness
- One camp sees Chinese labs’ open weights as a deliberate strategic move (economic/propaganda), with concerns about censorship (e.g., Tiananmen queries) and potential attack surfaces in agentic use.
- Another camp argues this is mostly intense domestic competition and a marketing tactic, paralleling Western startup strategies; also notes US models are often more closed.
- Some worry Chinese providers are now raising prices and closing new models, converging on the same SaaS playbook as US firms.
Coding workflows, harnesses, and long context
- Many emphasize that harness/tooling (Claude Code, Pi, OpenCode, VS Code plugins, etc.) matters as much as the base model.
- Reports that Qwen and GLM can outperform Claude/Gemini on niche technical tasks (e.g., graphics, low‑level math, Rust SIMD), but may be weaker as autonomous “whole‑project” agents.
- Long‑context behavior depends heavily on context caching and compaction; several note that long sessions degrade quality across vendors and that restarting sessions often works better.
Meta: hype, drift, and future
- Multiple users feel models like Claude have subtly worsened over time or that their own reliance has made weaknesses more visible.
- Some predict AI will become a commodity with many near‑equivalent models; others expect eventual convergence limited by data and a shift toward efficiency and harness design.