Qwen3.6-Max-Preview: Smarter, Sharper, Still Evolving

Model comparisons & benchmarks

  • Many question Qwen3.6-Max benchmarking against Claude Opus 4.5 instead of newer 4.6/4.7, seeing vendor benchmarks as inherently cherry‑picked and calling for independent evals.
  • Several report GLM 5.1 as “Sonnet–Opus level” for coding and tools, but slower and with reliability issues on some providers; others find it noticeably worse or loop‑prone.
  • Kimi K2.6 is raised as a strong alternative: slightly better SWE-Bench/Terminal-Bench scores than Qwen3.6-Max and notably cheaper per token.
  • Consensus: SOTA differences are now small and highly task‑dependent; people disagree which top model is “best,” and many feel we’ve hit “good enough” for a lot of dev work.

Cost, value, and rate limits

  • Strong divide: some see $100–$200/month for top models as trivial vs dev time; others are highly cost‑sensitive and prefer GLM, Qwen, MiniMax or local models.
  • Claude subscription limits (especially Opus) are widely criticized: users report hitting weekly limits in days or even hours, forcing workflow changes or cancellations.

Local and open-weight models

  • Qwen 3.5/3.6, Gemma 4, and GLM 5.1 (open weights) are repeatedly cited as the best local options, with MoE variants (Qwen3.6 35B-A3B, Gemma 4 26B-a4b) balancing quality and VRAM.
  • Max series (including Qwen3.6-Max-Preview) is cloud-only and proprietary; many see the real long‑term story in the open-weight Qwen series running on consumer hardware.
  • Several describe concrete setups using llama.cpp + Qwen/Gemma on single GPUs (e.g., 4090) achieving usable speeds for coding.

Chinese vs US labs & openness

  • One camp sees Chinese labs’ open weights as a deliberate strategic move (economic/propaganda), with concerns about censorship (e.g., Tiananmen queries) and potential attack surfaces in agentic use.
  • Another camp argues this is mostly intense domestic competition and a marketing tactic, paralleling Western startup strategies; also notes US models are often more closed.
  • Some worry Chinese providers are now raising prices and closing new models, converging on the same SaaS playbook as US firms.

Coding workflows, harnesses, and long context

  • Many emphasize that harness/tooling (Claude Code, Pi, OpenCode, VS Code plugins, etc.) matters as much as the base model.
  • Reports that Qwen and GLM can outperform Claude/Gemini on niche technical tasks (e.g., graphics, low‑level math, Rust SIMD), but may be weaker as autonomous “whole‑project” agents.
  • Long‑context behavior depends heavily on context caching and compaction; several note that long sessions degrade quality across vendors and that restarting sessions often works better.

Meta: hype, drift, and future

  • Multiple users feel models like Claude have subtly worsened over time or that their own reliance has made weaknesses more visible.
  • Some predict AI will become a commodity with many near‑equivalent models; others expect eventual convergence limited by data and a shift toward efficiency and harness design.