DeepSeek v4
Release details and model variants
- DeepSeek-V4 is released as a “preview” with open weights on Hugging Face, not just API access.
- Two main MoE models:
- V4-Pro: ~1.6T parameters, ~49B active, aimed at frontier performance.
- V4-Flash: ~284B parameters, ~13B active, smaller and cheaper, meant to be the “fast, efficient” option.
- Both support 1M-token context; paper highlights hybrid attention (CSA + HCA), manifold-constrained hyper-connections, and Muon optimizer, plus large (~32T token) pretraining.
Performance vs GPT/Claude/Kimi/GLM
- Benchmarks: close to Opus 4.5/4.6 and GPT-5.4; below GPT-5.5 and Opus 4.7 on many metrics.
- Their own Chinese announcement says V4-Pro is:
- Better than Sonnet 4.5.
- Near Opus 4.6 without “Thinking.”
- Worse than Opus 4.6 with “Thinking.”
- Some users report very strong math and research behavior, especially with “max thinking,” and competitive coding; others say it lags Kimi 2.6 and GLM 5.x in independent evals.
- Several comments stress “vibes over benchmarks”: real-world coding and agentic performance diverge from leaderboard scores, and benchmarks like SWE-bench are likely contaminated.
Pricing, hardware, and hosting
- OpenRouter pricing:
- Pro: ~$1.74/M input, $3.48/M output.
- Flash: ~$0.14/M input, $0.28/M output.
- Many see this as dramatically cheaper than US frontier APIs, especially for Opus‑level quality; some argue big US labs’ “subsidized” narrative is overstated.
- On-prem inference for full Pro is extremely heavy (tens of H100s or very high-end consumer GPU clusters).
- Flash (≈160 GB mixed FP4/FP8) is seen as plausible on high-end Macs or multi‑GPU rigs; quantization and SSD-streaming MoE tricks are discussed but considered slow and experimental.
Open weights, licensing, and “open source” debate
- Strong appreciation that both base and instruct weights are released; DeepSeek is praised for a broad ecosystem of open tooling.
- Ongoing argument over calling this “open source” vs “open weights” since training data and full reproducible pipelines are not provided.
- Some see open weights as crucial for control, fine-tuning, and non‑rug‑pull stability versus closed SaaS models.
Tooling, coding harnesses, and UX
- A major sticking point: no first-party “Claude Code–level” harness; stickiness may lag closed models.
- However, DeepSeek provides explicit guides to integrate with Claude Code; users report it works surprisingly well there and in other agents (Pi, OpenCode, Zed, etc.).
- Docs are widely praised as clear and developer-focused.
Geopolitics, trust, and chip ecosystem
- Big meta-thread on whether to trust Chinese vs US providers:
- Some fear Chinese state access; others feel more threatened by US surveillance and policy.
- Noted that inference is already running on Huawei Ascend NPUs; DeepSeek claims prices will fall further once Ascend 950 supernodes scale.
- Many see this as a significant challenge to Nvidia’s dominance and to US AI monopolies.
Meta: pace, burnout, and model churn
- Users express burnout from rapid frontier releases and constant “better than Opus/GPT” claims.
- Several say intelligence is now commoditized above a certain level; workflows, harnesses, reliability, and control matter more than marginal benchmark gains.