2026-04-20

Qwen3.6-Max-Preview: Smarter, Sharper, Still Evolving

Model comparisons & benchmarks

Many question Qwen3.6-Max benchmarking against Claude Opus 4.5 instead of newer 4.6/4.7, seeing vendor benchmarks as inherently cherry‑picked and calling for independent evals.
Several report GLM 5.1 as “Sonnet–Opus level” for coding and tools, but slower and with reliability issues on some providers; others find it noticeably worse or loop‑prone.
Kimi K2.6 is raised as a strong alternative: slightly better SWE-Bench/Terminal-Bench scores than Qwen3.6-Max and notably cheaper per token.
Consensus: SOTA differences are now small and highly task‑dependent; people disagree which top model is “best,” and many feel we’ve hit “good enough” for a lot of dev work.

Cost, value, and rate limits

Strong divide: some see $100–$200/month for top models as trivial vs dev time; others are highly cost‑sensitive and prefer GLM, Qwen, MiniMax or local models.
Claude subscription limits (especially Opus) are widely criticized: users report hitting weekly limits in days or even hours, forcing workflow changes or cancellations.

Local and open-weight models

Qwen 3.5/3.6, Gemma 4, and GLM 5.1 (open weights) are repeatedly cited as the best local options, with MoE variants (Qwen3.6 35B-A3B, Gemma 4 26B-a4b) balancing quality and VRAM.
Max series (including Qwen3.6-Max-Preview) is cloud-only and proprietary; many see the real long‑term story in the open-weight Qwen series running on consumer hardware.
Several describe concrete setups using llama.cpp + Qwen/Gemma on single GPUs (e.g., 4090) achieving usable speeds for coding.

Chinese vs US labs & openness

One camp sees Chinese labs’ open weights as a deliberate strategic move (economic/propaganda), with concerns about censorship (e.g., Tiananmen queries) and potential attack surfaces in agentic use.
Another camp argues this is mostly intense domestic competition and a marketing tactic, paralleling Western startup strategies; also notes US models are often more closed.
Some worry Chinese providers are now raising prices and closing new models, converging on the same SaaS playbook as US firms.

Coding workflows, harnesses, and long context

Many emphasize that harness/tooling (Claude Code, Pi, OpenCode, VS Code plugins, etc.) matters as much as the base model.
Reports that Qwen and GLM can outperform Claude/Gemini on niche technical tasks (e.g., graphics, low‑level math, Rust SIMD), but may be weaker as autonomous “whole‑project” agents.
Long‑context behavior depends heavily on context caching and compaction; several note that long sessions degrade quality across vendors and that restarting sessions often works better.

Meta: hype, drift, and future

Multiple users feel models like Claude have subtly worsened over time or that their own reliance has made weaknesses more visible.
Some predict AI will become a commodity with many near‑equivalent models; others expect eventual convergence limited by data and a shift toward efficiency and harness design.

Related topics