GLM-5.2 is the new leading open weights model on Artificial Analysis
Capabilities and Benchmark Position
- Many commenters see GLM‑5.2 as a major step forward, roughly Opus 4.6–4.7 level for coding and “near frontier,” with some saying it rivals or slightly surpasses earlier Opus in practice.
- Artificial Analysis and other benchmarks place it around the top tier of coding models, though still below GPT‑5.5 and Fable on the highest-end metrics.
- Some report it feels smarter and more stable than GLM‑5.1, especially in not getting stuck in reasoning loops.
Reasoning Style and Token Efficiency
- GLM‑5.2 Max is described as extremely “thinky” and verbose, similar to Opus 4.8 Max, often burning tens of thousands of tokens and being slow.
- Users recommend the “High” setting as a better tradeoff: similar quality for many tasks with ≈2–2.5× fewer tokens.
- Several people are frustrated by over‑planning vs. “just write the code,” a complaint shared with other frontier models.
Cost, Plans, and Third‑Party Hosting
- Despite very low official per‑token prices relative to Opus/GPT, real-world costs can add up quickly due to high token usage.
- Some users find Z.ai’s subscription quotas and multipliers (higher deductions for GLM‑5.2 usage) poor value, preferring fixed-price Claude/GPT plans.
- Third‑party providers (OpenRouter and others) offer cheaper or quantized GLM‑5.2, but there are concerns about misconfiguration, silent quantization, and reliability.
Open Weights, Self‑Hosting, and Hardware
- Open weights are seen as strategically important: enable privacy, enterprise self‑hosting, and competition with closed labs.
- Running the full model locally currently requires very large GPU or unified memory; some discuss 8‑GPU or high‑end workstation setups and energy cost.
- Others note that for most businesses today, negotiating private cloud deployments is still easier than building in‑house infra.
Multimodality and Missing Vision
- GLM‑5.2 is text‑only, which surprises many given current norms.
- Lack of image input is a real limitation for UI/screenshot workflows and some agent use cases, though people suggest combining separate vision models as a workaround.
Benchmarks vs. Real‑World Use
- Multiple commenters warn that benchmark scores (Artificial Analysis, DeepSWE, etc.) don’t always match long‑horizon, multi‑turn coding performance.
- Some find GLM‑5.2 excellent in coding harnesses; others report subtle hallucinations and sloppiness that make them hesitant for “serious” work.
- Consensus: it’s a strong, cheap open‑weights option, but GPT‑5.5, Fable, and latest Opus still lead on raw capability and efficiency.