GLM-4.7: Advancing the Coding Capability

Perceived Capability & Benchmarks

  • Many find GLM‑4.7 very strong for coding, often “in the Sonnet zone,” below Opus/GPT‑5.2 but close enough for daily work, especially given its cost.
  • Benchmarks are mixed: some point to weak “terminal bench” scores; others cite strong SWE-bench numbers (e.g., beating Sonnet 3.5 by a wide margin, slightly ahead of Sonnet 4, slightly behind 4.5).
  • Several note that benchmark leaders often perform poorly on real tasks, whereas GLM‑4.6/4.7 feels better than its scores suggest; consensus is that hands-on testing matters more than charts.

Pricing, Value, and Product Positioning

  • Z.ai’s subscriptions (including cheap annual “lite” and coding plans) are repeatedly called “insanely cheap,” ideal as a Claude/GPT backup or secondary daily driver.
  • Users contrast this with Anthropic’s high per-token and agentic-pricing, seeing GLM as “Claude Code but cheaper,” especially for long-running coding tools.
  • Some worry low pricing is subsidized “dumping,” potentially anti-competitive long-term.

Usage Patterns & Tooling

  • Popular workflows:
    • Use Claude/GPT for planning and “tasteful” refactoring, GLM‑4.6/4.7 for implementation.
    • Use GLM via Claude Code MCS/MCP endpoints or tools like Crush/OpenCode; some tweak env vars so all “Haiku/Sonnet/Opus” slots map to GLM.
  • Several praise GLM‑4.7’s tool use and agentic coding; others found earlier models underwhelming in OpenCode and reverted to Claude Code.

Local Inference, Hardware, and MoE

  • Thread is dense with local-serving debate: Mac Studio/M4, Strix Halo, RTX 4090/5090, multi‑GPU rigs, Cerebras/Groq ASICs.
  • Consensus: GLM‑4.7’s 358B MoE (32B active) is still too big for smooth interactive use on typical consumer hardware; quantized local runs are “hobby/async,” not yet a practical Claude Code replacement.
  • Clarified that MoE reduces compute and bandwidth per token, not RAM capacity; full parameters still must be loaded.

Distillation, Similarity to Gemini, and Training

  • Multiple commenters think GLM‑4.7’s frontend examples and chain-of-thought style look strikingly like Gemini 3, suspecting distillation from frontier models.
  • Some say this is fine—even desirable—if it yields cheap open weights. Others argue language tics (e.g., “you’re absolutely right”) aren’t reliable evidence of training sources.

Privacy, Terms, and Politics

  • Z.ai’s terms allow extensive training on user data and broad rights over user content; several warn against using it for serious/proprietary work.
  • Some see Chinese-origin models as heavily censored on topics like Tiananmen; others dismiss such political tests as irrelevant for a coding-optimized model.

Competition and Ecosystem

  • Many welcome GLM‑4.7 as proof open‑weight models are closing the gap with billion‑dollar proprietary systems, adding price pressure on Anthropic/OpenAI/Google/xAI.
  • Omitted comparisons (e.g., Gemini 3 Pro in charts, Grok 4 Heavy, Opus 4.5) are criticized as selective benchmarking.