Tencent's 'Hunyuan-T1'–The First Mamba-Powered Ultra-Large Model

Website UX & Naming

  • Several people note the official page renders poorly on phones, with text cut off and no right padding, calling it sloppy for a flagship AI product.
  • Discussion on the model name “Hunyuan”: explanation of the Chinese meaning (“Primordial Chaos/Original Unity”) and comparison to Western mythological naming like “Apollo” / “Prometheus”.
  • Debate over romanization: complaints that “Hunyuan” without tones is lossy; suggestions for tone-marked pinyin or spaced syllables (“Hun Yuan”) as more readable/lookup‑friendly, but others note tones don’t help most non‑Chinese speakers and Chinese readers just want characters.

Reinforcement Learning, Benchmarks & Goodhart’s Law

  • A key worry: RL might just “game” benchmarks rather than improve general usefulness, with parallels to Goodhart’s law and school testing.
  • Some argue all optimization is “gaming a benchmark” so the real issue is designing meaningful evals and train/test splits; others point out that for LLMs it’s hard to ensure test sets are truly unseen.
  • Mention of benchmark proliferation (ARC, etc.) and models rapidly “beating” them, raising contamination concerns.
  • Multiple comments stress that benchmarks are necessary but insufficient; real validation comes from deployment on real tasks and private evals.

Capabilities, Limitations & Hallucinations

  • Users report persistent hallucinations (e.g., fabricating GitHub code) even when told “don’t hallucinate,” contrasting with claims that it’s hard to find tasks models can’t do.
  • Some propose tool‑use (e.g., calculators via tool frameworks) as the practical fix for math and similar brittle areas.

Political Alignment & Information Control

  • Tests around topics like Tibet, Tiananmen, and US/China politics show strongly state-aligned narratives in Chinese models and safety refusals on sensitive topics (e.g., overthrowing governments).
  • Comparisons drawn to Western models’ own alignment/censorship, but commenters emphasize the more centralized, legally mandated nature of control in China.

Multilingual Behavior & System Prompts

  • Users observe that the model often responds in Chinese even to English prompts; inspection suggests this is explicitly dictated by its system prompt, which says it “mainly uses Chinese.”
  • Some connect bilingual behavior to questions about linguistic relativity (Whorfian hypothesis), though conclusions remain speculative/unclear.

Architecture, Mamba Hybrid & Significance

  • Interest that the base is a Hybrid Transformer–Mamba MoE model, not pure Mamba; taken as informal evidence that Mamba alone still has practical issues.
  • Excitement from some about strong performance of a Mamba‑based system; others note the sheer number of new models makes it hard to tell what is genuinely impactful.

Trust, Openness & Metrics

  • Question whether linking a Hugging Face demo implies future weight release; status remains unclear.
  • Skepticism about score‑centric marketing: fear that labs quietly train on test sets or otherwise “optimize to the leaderboard,” especially since training data is undisclosed.
  • Comparisons to standardized testing in education: benchmarks drive progress but also distort incentives.

Generation Behavior: Stopping & “Thinking Tokens”

  • One user notes “non‑stopping” responses as a practical issue; others ask how to better train end‑of‑sequence behavior, suggesting targeted fine‑tuning but noting weak generalization.
  • Discussion of “OK, so…” / “好的 …” as recurring first “thinking” tokens in chain‑of‑thought models: some see them as wasted, others cite research indicating extra “pause/thinking” tokens can improve reasoning by effectively increasing compute per answer.

Math & “Understanding”

  • A side debate over charts showing non‑perfect accuracy on multi‑digit multiplication: one camp treats any failure on trivial arithmetic as proof of “stochastic parrot” limits, another notes that for large numbers these models already exceed typical human mental‑math ability.