GLM-4.5: Reasoning, Coding, and Agentic Abililties
Model origin and positioning
- Commenters identify GLM-4.5 as coming out of the Tsinghua ecosystem and backed heavily by Chinese state-linked funding, seeing it as evidence of deep Chinese AI talent.
- Some argue China is now roughly tied with US LLMs and may lead in robotics, citing strong coordination between government, education, and industry.
- Others push back on threads that feel like coordinated “pro-China” talking points.
Censorship and political constraints
- Many users probe the model with questions on Tiananmen Square, Xi Jinping, Tibet, CCP representation in the NPC, “is China a democracy?”, etc.
- Typical outcomes: refusal with “content security” errors, evasive historical answers, or overtly pro-government framing. Sometimes chain-of-thought reveals awareness before the final answer is blocked.
- Some are frustrated by “low-effort” Tiananmen tests; others argue this is a valid and important evaluation dimension, and note Western models also have political constraints, though of different kinds.
- One thread stresses that both US and Chinese state-aligned models merit criticism, not whataboutism.
Claude identity and training data
- Several users report the model introducing itself as Claude or exposing a system prompt that begins “You are Claude, an AI assistant created by Anthropic.”
- Explanations debated:
- Hidden routing/fallback to Claude vs.
- Training data polluted with Claude outputs or system prompts vs.
- “Subliminal” behavior transfer from distillation on other models’ outputs.
- Others note that LLMs often misidentify themselves and that post-training on other models’ text is common, without implying live routing.
Coding and reasoning performance
- Multiple users test it for programming:
- Some say it rivals or beats Claude Sonnet 3.5 and is stronger than DeepSeek R1 or Qwen for tool use, multi-step reasoning, and following instructions.
- Particularly praised for backend/server code and frontend logic; weaker for visual/design and creative tasks (art critiques, lyrics).
- One user claims it solved a complex networking issue that O3 Pro failed on, and much faster, prompting them to cancel a paid subscription.
- Claims in the blog about beating O3/Grok/Gemini on coding benchmarks are noted; some plan systematic comparison.
Local use, tooling, and context
- Quantized “Air” variants (3–4 bit) are reported running on high-RAM Macs (48–60GB) with good coding performance and even simple games built from scratch.
- Some integration issues reported via OpenRouter (getting stuck on earlier messages).
- 128k context length is viewed as underwhelming by some; a user asks for benchmarks on actual effective context versus advertised limits.
Branding and miscellany
- Some confusion/annoyance over the name “GLM” overlapping with “generalized linear model.”
- Light discussion of the z.ai domain and single-letter .ai domains.
- Mixed reactions to the model’s “vibes”: technically competent but sometimes “weird” or uninspired compared to US models in creative domains.