2026-04-16

Qwen3.6-35B-A3B: Agentic coding power, now open to all

Overall reception & openness

Many are pleased Qwen is still releasing open weights despite internal turmoil and fears they would go closed-only.
Enthusiasm centers on getting a strong, agentic coding model that can run locally, without subscriptions or heavy “too dangerous to release” marketing.
Some disappointment that only the 35B MoE is released so far; people hope for smaller (e.g., 9B) and mid/large (122B) open variants, but the flagship ~397B may stay closed.

Model architecture, MoE vs dense

Qwen3.6-35B-A3B is a Mixture-of-Experts model: 35B total parameters, ~3B active per token.
Several commenters argue prior MoE (3.5-35B-A3B) underperformed dense 3.5-27B; they’re skeptical 3.6 MoE can fully replace dense 27B, especially on long-horizon tasks.
Qwen’s own benchmarks claim 3.6-35B-A3B clearly beats 3.5-35B-A3B and “rivals” 3.5-27B; independent users report mixed impressions and want to test themselves.

Quality vs proprietary models

Consensus: this model is not frontier level (Sonnet, GPT, Opus), though it may approach Claude Haiku 4.5 on some coding benchmarks (SWE-bench, LiveCodeBench, etc.).
Some note that “tiny” open models are now roughly at 2023 GPT‑4 quality for many tasks, but still clearly below today’s top commercial systems and can loop or stall on complex agent tasks.

Local deployment, hardware & quantization

Much discussion on running it locally: MoE’s 3B active parameters allow partial CPU offload and use on 16–24GB GPUs or high‑RAM machines (Macs, Strix Halo, mid‑range gaming PCs).
Context/KV cache often becomes the limiting factor, especially for coding agents with large contexts.
Unsloth provides GGUF quants from ~10–27GB; users warn early quants and runtimes often have bugs, so waiting a week and updating is advised.
llama.cpp, LM Studio, vLLM, Ollama, and MLX are common inference stacks; some advocate bypassing Ollama for more control.

Use cases & workflows

Main target is “agentic coding” with harnesses like Pi, OpenCode, Claude Code, or custom multi-agent setups; model also supports FIM for editor autocomplete when correctly configured.
People also use Qwen models for local vision tasks (OCR, surveillance, table extraction), translation, and batch document processing where no rate limits or data sharing are acceptable.

Censorship, trust & regulation

Cloud-hosted Qwen is heavily aligned on Chinese political topics; users mention uncensored community variants for local use.
Strong advice: assume any remote provider may log or train on data; run open weights locally for privacy.
Some report US government‑related contracts already banning Chinese models (even local), reflecting “supply chain” and influence concerns.

Related topics