GLM-4.7: Advancing the Coding Capability
Perceived Capability & Benchmarks
- Many find GLM‑4.7 very strong for coding, often “in the Sonnet zone,” below Opus/GPT‑5.2 but close enough for daily work, especially given its cost.
- Benchmarks are mixed: some point to weak “terminal bench” scores; others cite strong SWE-bench numbers (e.g., beating Sonnet 3.5 by a wide margin, slightly ahead of Sonnet 4, slightly behind 4.5).
- Several note that benchmark leaders often perform poorly on real tasks, whereas GLM‑4.6/4.7 feels better than its scores suggest; consensus is that hands-on testing matters more than charts.
Pricing, Value, and Product Positioning
- Z.ai’s subscriptions (including cheap annual “lite” and coding plans) are repeatedly called “insanely cheap,” ideal as a Claude/GPT backup or secondary daily driver.
- Users contrast this with Anthropic’s high per-token and agentic-pricing, seeing GLM as “Claude Code but cheaper,” especially for long-running coding tools.
- Some worry low pricing is subsidized “dumping,” potentially anti-competitive long-term.
Usage Patterns & Tooling
- Popular workflows:
- Use Claude/GPT for planning and “tasteful” refactoring, GLM‑4.6/4.7 for implementation.
- Use GLM via Claude Code MCS/MCP endpoints or tools like Crush/OpenCode; some tweak env vars so all “Haiku/Sonnet/Opus” slots map to GLM.
- Several praise GLM‑4.7’s tool use and agentic coding; others found earlier models underwhelming in OpenCode and reverted to Claude Code.
Local Inference, Hardware, and MoE
- Thread is dense with local-serving debate: Mac Studio/M4, Strix Halo, RTX 4090/5090, multi‑GPU rigs, Cerebras/Groq ASICs.
- Consensus: GLM‑4.7’s 358B MoE (32B active) is still too big for smooth interactive use on typical consumer hardware; quantized local runs are “hobby/async,” not yet a practical Claude Code replacement.
- Clarified that MoE reduces compute and bandwidth per token, not RAM capacity; full parameters still must be loaded.
Distillation, Similarity to Gemini, and Training
- Multiple commenters think GLM‑4.7’s frontend examples and chain-of-thought style look strikingly like Gemini 3, suspecting distillation from frontier models.
- Some say this is fine—even desirable—if it yields cheap open weights. Others argue language tics (e.g., “you’re absolutely right”) aren’t reliable evidence of training sources.
Privacy, Terms, and Politics
- Z.ai’s terms allow extensive training on user data and broad rights over user content; several warn against using it for serious/proprietary work.
- Some see Chinese-origin models as heavily censored on topics like Tiananmen; others dismiss such political tests as irrelevant for a coding-optimized model.
Competition and Ecosystem
- Many welcome GLM‑4.7 as proof open‑weight models are closing the gap with billion‑dollar proprietary systems, adding price pressure on Anthropic/OpenAI/Google/xAI.
- Omitted comparisons (e.g., Gemini 3 Pro in charts, Grok 4 Heavy, Opus 4.5) are criticized as selective benchmarking.