GLM-5: Targeting complex systems engineering and long-horizon agentic tasks

Launch, Availability, and Pricing Confusion

  • Release initially felt like a “soft launch”: only a short X post, chat UI updated first, API listing present but returned access errors, docs and pricing lagged behind.
  • Plan behavior caused confusion: Lite/Pro initially excluded GLM‑5 despite “same-tier updates” wording; only Max / general API worked, with PAYG billing. Later, messaging was updated to clarify this.
  • Some users find GLM subscriptions dramatically cheaper than Anthropic/OpenAI (especially past promos and annual discounts), others note recent price hikes and that GLM‑5 input tokens are pricier than 4.7.
  • Several commenters describe switching from Anthropic plans to combinations like GLM + Codex/Gemini for better cost–usage balance.

Real-World Performance vs Benchmarks

  • Marketing compares GLM‑5 to Opus 4.5 and GPT‑5.2, not to the very latest releases, which some see as a red flag or “benchmaxxing”; sampling tweaks in benchmark notes also raise suspicion.
  • Mixed hands-on reports:
    • Positive: strong coding and tool use, including in obscure/custom languages; good long-horizon agentic work; better than 4.7 and “good enough” to replace frontier models for many coding tasks.
    • Negative: weak general problem solving for some users, elaborate hallucinations, trouble with custom tool-calling formats, and occasional poor web/search grounding.
  • Consensus: open-weight models often look excellent on benchmarks but still lag frontier proprietary models in instruction following, stability, and RLHF polish, though the gap is narrowing. One composite metric puts GLM‑5 slightly above GPT‑5.2 but below Opus 4.6.

Open Weights, China, and Censorship

  • Many see Chinese open-weight models (GLM, DeepSeek, Kimi, Qwen) as crucial for avoiding lock-in to a few US megacorps and for enabling self-/alt-hosting and provider competition.
  • Debate over trust and censorship:
    • Tiananmen-style prompts are used as a “censorship test”; GLM versions sometimes respond with party-line text or freeze. Some argue this is an unfair fixation given Western safety filters on other topics.
    • Others emphasize the difference between company-level content policies and state-mandated censorship, given models are becoming primary information sources.
  • Conflicting claims about whether GLM‑5 was trained on Huawei Ascend vs only deployed on domestic chips; Reuters-style wording is ambiguous, and several note that if full training had been on Ascend it would likely be loudly advertised.

Local Hosting, Hardware, and Economics

  • Long discussion around running large Chinese models locally:
    • Macs (M-series with unified memory) and upcoming Strix Halo desktops are seen as the most “consumer-feasible” options, but true frontier-scale models still need 512GB–1TB+ VRAM/RAM.
    • DIY multi‑GPU Linux rigs can run smaller/distilled variants at decent speeds, but are costly and power-hungry.
  • Repeated back-of-envelope calculations suggest that, for most people, cloud/API usage is far cheaper than buying hardware unless you already own GPUs or care deeply about privacy, offline availability, or quota independence.

Tooling and Ecosystem

  • Alongside GLM‑5, users note:
    • GLM‑5‑Coder, a coding-specialized variant.
    • A new agentic mode in the chat UI, and an IDE-like “zcode” product.
    • Supplemental services: document reading (zread), OCR, image generation, and voice cloning.
  • GLM‑4.7‑Flash is widely praised as the first local coder that feels “intelligent enough” on modest hardware; GLM‑5 is expected to follow via distillation/quantization.
  • Open-source harnesses (e.g., OpenCode) let users swap between GLM, GPT‑5.3‑Codex, Kimi, etc., reinforcing a pattern: keep frontier models for hardest reasoning, use cheaper open-weight models for day-to-day coding grunt work.