2025-12-22

GLM-4.7: Advancing the Coding Capability

Perceived Capability & Benchmarks

Many find GLM‑4.7 very strong for coding, often “in the Sonnet zone,” below Opus/GPT‑5.2 but close enough for daily work, especially given its cost.
Benchmarks are mixed: some point to weak “terminal bench” scores; others cite strong SWE-bench numbers (e.g., beating Sonnet 3.5 by a wide margin, slightly ahead of Sonnet 4, slightly behind 4.5).
Several note that benchmark leaders often perform poorly on real tasks, whereas GLM‑4.6/4.7 feels better than its scores suggest; consensus is that hands-on testing matters more than charts.

Pricing, Value, and Product Positioning

Z.ai’s subscriptions (including cheap annual “lite” and coding plans) are repeatedly called “insanely cheap,” ideal as a Claude/GPT backup or secondary daily driver.
Users contrast this with Anthropic’s high per-token and agentic-pricing, seeing GLM as “Claude Code but cheaper,” especially for long-running coding tools.
Some worry low pricing is subsidized “dumping,” potentially anti-competitive long-term.

Usage Patterns & Tooling

Popular workflows:
- Use Claude/GPT for planning and “tasteful” refactoring, GLM‑4.6/4.7 for implementation.
- Use GLM via Claude Code MCS/MCP endpoints or tools like Crush/OpenCode; some tweak env vars so all “Haiku/Sonnet/Opus” slots map to GLM.
Several praise GLM‑4.7’s tool use and agentic coding; others found earlier models underwhelming in OpenCode and reverted to Claude Code.

Local Inference, Hardware, and MoE

Thread is dense with local-serving debate: Mac Studio/M4, Strix Halo, RTX 4090/5090, multi‑GPU rigs, Cerebras/Groq ASICs.
Consensus: GLM‑4.7’s 358B MoE (32B active) is still too big for smooth interactive use on typical consumer hardware; quantized local runs are “hobby/async,” not yet a practical Claude Code replacement.
Clarified that MoE reduces compute and bandwidth per token, not RAM capacity; full parameters still must be loaded.

Distillation, Similarity to Gemini, and Training

Multiple commenters think GLM‑4.7’s frontend examples and chain-of-thought style look strikingly like Gemini 3, suspecting distillation from frontier models.
Some say this is fine—even desirable—if it yields cheap open weights. Others argue language tics (e.g., “you’re absolutely right”) aren’t reliable evidence of training sources.

Privacy, Terms, and Politics

Z.ai’s terms allow extensive training on user data and broad rights over user content; several warn against using it for serious/proprietary work.
Some see Chinese-origin models as heavily censored on topics like Tiananmen; others dismiss such political tests as irrelevant for a coding-optimized model.

Competition and Ecosystem

Many welcome GLM‑4.7 as proof open‑weight models are closing the gap with billion‑dollar proprietary systems, adding price pressure on Anthropic/OpenAI/Google/xAI.
Omitted comparisons (e.g., Gemini 3 Pro in charts, Grok 4 Heavy, Opus 4.5) are criticized as selective benchmarking.

Related topics