2025-07-28

GLM-4.5: Reasoning, Coding, and Agentic Abililties

Model origin and positioning

Commenters identify GLM-4.5 as coming out of the Tsinghua ecosystem and backed heavily by Chinese state-linked funding, seeing it as evidence of deep Chinese AI talent.
Some argue China is now roughly tied with US LLMs and may lead in robotics, citing strong coordination between government, education, and industry.
Others push back on threads that feel like coordinated “pro-China” talking points.

Censorship and political constraints

Many users probe the model with questions on Tiananmen Square, Xi Jinping, Tibet, CCP representation in the NPC, “is China a democracy?”, etc.
Typical outcomes: refusal with “content security” errors, evasive historical answers, or overtly pro-government framing. Sometimes chain-of-thought reveals awareness before the final answer is blocked.
Some are frustrated by “low-effort” Tiananmen tests; others argue this is a valid and important evaluation dimension, and note Western models also have political constraints, though of different kinds.
One thread stresses that both US and Chinese state-aligned models merit criticism, not whataboutism.

Claude identity and training data

Several users report the model introducing itself as Claude or exposing a system prompt that begins “You are Claude, an AI assistant created by Anthropic.”
Explanations debated:
- Hidden routing/fallback to Claude vs.
- Training data polluted with Claude outputs or system prompts vs.
- “Subliminal” behavior transfer from distillation on other models’ outputs.
Others note that LLMs often misidentify themselves and that post-training on other models’ text is common, without implying live routing.

Coding and reasoning performance

Multiple users test it for programming:
- Some say it rivals or beats Claude Sonnet 3.5 and is stronger than DeepSeek R1 or Qwen for tool use, multi-step reasoning, and following instructions.
- Particularly praised for backend/server code and frontend logic; weaker for visual/design and creative tasks (art critiques, lyrics).
- One user claims it solved a complex networking issue that O3 Pro failed on, and much faster, prompting them to cancel a paid subscription.
Claims in the blog about beating O3/Grok/Gemini on coding benchmarks are noted; some plan systematic comparison.

Local use, tooling, and context

Quantized “Air” variants (3–4 bit) are reported running on high-RAM Macs (48–60GB) with good coding performance and even simple games built from scratch.
Some integration issues reported via OpenRouter (getting stuck on earlier messages).
128k context length is viewed as underwhelming by some; a user asks for benchmarks on actual effective context versus advertised limits.

Branding and miscellany

Some confusion/annoyance over the name “GLM” overlapping with “generalized linear model.”
Light discussion of the z.ai domain and single-letter .ai domains.
Mixed reactions to the model’s “vibes”: technically competent but sometimes “weird” or uninspired compared to US models in creative domains.

Related topics