2026-06-17

GLM-5.2 is the new leading open weights model on Artificial Analysis

Original Article ↗ Hacker News Discussion ↗

Capabilities and Benchmark Position

Many commenters see GLM‑5.2 as a major step forward, roughly Opus 4.6–4.7 level for coding and “near frontier,” with some saying it rivals or slightly surpasses earlier Opus in practice.
Artificial Analysis and other benchmarks place it around the top tier of coding models, though still below GPT‑5.5 and Fable on the highest-end metrics.
Some report it feels smarter and more stable than GLM‑5.1, especially in not getting stuck in reasoning loops.

Reasoning Style and Token Efficiency

GLM‑5.2 Max is described as extremely “thinky” and verbose, similar to Opus 4.8 Max, often burning tens of thousands of tokens and being slow.
Users recommend the “High” setting as a better tradeoff: similar quality for many tasks with ≈2–2.5× fewer tokens.
Several people are frustrated by over‑planning vs. “just write the code,” a complaint shared with other frontier models.

Cost, Plans, and Third‑Party Hosting

Despite very low official per‑token prices relative to Opus/GPT, real-world costs can add up quickly due to high token usage.
Some users find Z.ai’s subscription quotas and multipliers (higher deductions for GLM‑5.2 usage) poor value, preferring fixed-price Claude/GPT plans.
Third‑party providers (OpenRouter and others) offer cheaper or quantized GLM‑5.2, but there are concerns about misconfiguration, silent quantization, and reliability.

Open Weights, Self‑Hosting, and Hardware

Open weights are seen as strategically important: enable privacy, enterprise self‑hosting, and competition with closed labs.
Running the full model locally currently requires very large GPU or unified memory; some discuss 8‑GPU or high‑end workstation setups and energy cost.
Others note that for most businesses today, negotiating private cloud deployments is still easier than building in‑house infra.

Multimodality and Missing Vision

GLM‑5.2 is text‑only, which surprises many given current norms.
Lack of image input is a real limitation for UI/screenshot workflows and some agent use cases, though people suggest combining separate vision models as a workaround.

Benchmarks vs. Real‑World Use

Multiple commenters warn that benchmark scores (Artificial Analysis, DeepSWE, etc.) don’t always match long‑horizon, multi‑turn coding performance.
Some find GLM‑5.2 excellent in coding harnesses; others report subtle hallucinations and sloppiness that make them hesitant for “serious” work.
Consensus: it’s a strong, cheap open‑weights option, but GPT‑5.5, Fable, and latest Opus still lead on raw capability and efficiency.