GLM-5.2 is the new leading open weights model on Artificial Analysis

Capabilities and Benchmark Position

  • Many commenters see GLM‑5.2 as a major step forward, roughly Opus 4.6–4.7 level for coding and “near frontier,” with some saying it rivals or slightly surpasses earlier Opus in practice.
  • Artificial Analysis and other benchmarks place it around the top tier of coding models, though still below GPT‑5.5 and Fable on the highest-end metrics.
  • Some report it feels smarter and more stable than GLM‑5.1, especially in not getting stuck in reasoning loops.

Reasoning Style and Token Efficiency

  • GLM‑5.2 Max is described as extremely “thinky” and verbose, similar to Opus 4.8 Max, often burning tens of thousands of tokens and being slow.
  • Users recommend the “High” setting as a better tradeoff: similar quality for many tasks with ≈2–2.5× fewer tokens.
  • Several people are frustrated by over‑planning vs. “just write the code,” a complaint shared with other frontier models.

Cost, Plans, and Third‑Party Hosting

  • Despite very low official per‑token prices relative to Opus/GPT, real-world costs can add up quickly due to high token usage.
  • Some users find Z.ai’s subscription quotas and multipliers (higher deductions for GLM‑5.2 usage) poor value, preferring fixed-price Claude/GPT plans.
  • Third‑party providers (OpenRouter and others) offer cheaper or quantized GLM‑5.2, but there are concerns about misconfiguration, silent quantization, and reliability.

Open Weights, Self‑Hosting, and Hardware

  • Open weights are seen as strategically important: enable privacy, enterprise self‑hosting, and competition with closed labs.
  • Running the full model locally currently requires very large GPU or unified memory; some discuss 8‑GPU or high‑end workstation setups and energy cost.
  • Others note that for most businesses today, negotiating private cloud deployments is still easier than building in‑house infra.

Multimodality and Missing Vision

  • GLM‑5.2 is text‑only, which surprises many given current norms.
  • Lack of image input is a real limitation for UI/screenshot workflows and some agent use cases, though people suggest combining separate vision models as a workaround.

Benchmarks vs. Real‑World Use

  • Multiple commenters warn that benchmark scores (Artificial Analysis, DeepSWE, etc.) don’t always match long‑horizon, multi‑turn coding performance.
  • Some find GLM‑5.2 excellent in coding harnesses; others report subtle hallucinations and sloppiness that make them hesitant for “serious” work.
  • Consensus: it’s a strong, cheap open‑weights option, but GPT‑5.5, Fable, and latest Opus still lead on raw capability and efficiency.