Deepseek: The quiet giant leading China’s AI race
Model performance and behavior
- Many see DeepSeek V3 as roughly comparable to top Western models on reasoning and coding, with some users calling it on par with Claude for programming.
- Others say it’s not actually “on par” with GPT‑4/Claude in real use: it feels overfitted, “stubborn,” hard to steer, and prone to repeating or insisting on solving math problems instead of following nuanced instructions.
- Crowdsourced arenas (e.g., LMSYS) rank it highly, but some posters distrust such leaderboards and report “average at best” subjective performance.
Cost, efficiency, and architecture
- Major excitement centers on claims that DeepSeek achieved near‑SOTA performance with ~10x less training cost and extremely cheap inference.
- Its Mixture‑of‑Experts approach (671B total params, ~37B active) and custom routing/balancing are highlighted as a big architectural advance; some compare “deep vs wide” model trade‑offs and see MoE as more “wide.”
- Some argue the real breakthrough is efficiency, not raw capability: if SOTA becomes cheap and commoditized, massive GPU stockpiles become less of a moat.
Open weights, data, and “moats”
- DeepSeek releases open weights and technical details but not full training/serving code; several call this “open weights, not open source.”
- Debate on whether openness erases its advantage: critics say others can just copy; supporters say replication lags, and the true edge is know‑how and fast iteration.
- Multiple commenters note heavy use of synthetic data (e.g., ChatGPT transcripts); DeepSeek models sometimes still insist they are “ChatGPT,” seen as evidence of such training.
- API is cheap partly because user data may be reused for training; posters contrast this with OpenAI/Anthropic’s API policies.
Hardware sanctions and innovation pressure
- GPU export controls are seen by some as forcing Chinese teams to “do more with less,” driving algorithmic efficiency and domestic chip efforts.
- Others argue sanctions are porous (smuggling, cloud rentals abroad) and mainly raise costs rather than blocking access.
- There’s speculation that constraints on high‑end GPUs could push China toward alternative compute architectures and further optimizations.
Censorship, alignment, and safety
- Posters argue both Chinese and Western models are constrained, just in different ways: CCP‑style political censorship vs. Western “alignment/safety” norms.
- Some claim Western models feel heavily sanitized on culture‑war topics, while Chinese models must avoid sensitive political/historical issues.
- Disagreement over which regime is more technically limiting: some say censorship will “lobotomize” Chinese models; others note Western systems already refuse many queries.
Broader geopolitical and economic context
- Long tangents debate China’s rise, demographics, soft power, startup culture, and military/economic rivalry with the US.
- Several see DeepSeek as evidence that China can now match or exceed Western AI innovation despite sanctions and censorship, challenging assumptions about permanent Western dominance.
- Others remain skeptical that China can sustain cutting‑edge leadership under authoritarian politics and capital controls.