GLM-4.5: Reasoning, Coding, and Agentic Abililties

Model origin and positioning

  • Commenters identify GLM-4.5 as coming out of the Tsinghua ecosystem and backed heavily by Chinese state-linked funding, seeing it as evidence of deep Chinese AI talent.
  • Some argue China is now roughly tied with US LLMs and may lead in robotics, citing strong coordination between government, education, and industry.
  • Others push back on threads that feel like coordinated “pro-China” talking points.

Censorship and political constraints

  • Many users probe the model with questions on Tiananmen Square, Xi Jinping, Tibet, CCP representation in the NPC, “is China a democracy?”, etc.
  • Typical outcomes: refusal with “content security” errors, evasive historical answers, or overtly pro-government framing. Sometimes chain-of-thought reveals awareness before the final answer is blocked.
  • Some are frustrated by “low-effort” Tiananmen tests; others argue this is a valid and important evaluation dimension, and note Western models also have political constraints, though of different kinds.
  • One thread stresses that both US and Chinese state-aligned models merit criticism, not whataboutism.

Claude identity and training data

  • Several users report the model introducing itself as Claude or exposing a system prompt that begins “You are Claude, an AI assistant created by Anthropic.”
  • Explanations debated:
    • Hidden routing/fallback to Claude vs.
    • Training data polluted with Claude outputs or system prompts vs.
    • “Subliminal” behavior transfer from distillation on other models’ outputs.
  • Others note that LLMs often misidentify themselves and that post-training on other models’ text is common, without implying live routing.

Coding and reasoning performance

  • Multiple users test it for programming:
    • Some say it rivals or beats Claude Sonnet 3.5 and is stronger than DeepSeek R1 or Qwen for tool use, multi-step reasoning, and following instructions.
    • Particularly praised for backend/server code and frontend logic; weaker for visual/design and creative tasks (art critiques, lyrics).
    • One user claims it solved a complex networking issue that O3 Pro failed on, and much faster, prompting them to cancel a paid subscription.
  • Claims in the blog about beating O3/Grok/Gemini on coding benchmarks are noted; some plan systematic comparison.

Local use, tooling, and context

  • Quantized “Air” variants (3–4 bit) are reported running on high-RAM Macs (48–60GB) with good coding performance and even simple games built from scratch.
  • Some integration issues reported via OpenRouter (getting stuck on earlier messages).
  • 128k context length is viewed as underwhelming by some; a user asks for benchmarks on actual effective context versus advertised limits.

Branding and miscellany

  • Some confusion/annoyance over the name “GLM” overlapping with “generalized linear model.”
  • Light discussion of the z.ai domain and single-letter .ai domains.
  • Mixed reactions to the model’s “vibes”: technically competent but sometimes “weird” or uninspired compared to US models in creative domains.