2026-04-24

DeepSeek v4

Release details and model variants

DeepSeek-V4 is released as a “preview” with open weights on Hugging Face, not just API access.
Two main MoE models:
- V4-Pro: ~1.6T parameters, ~49B active, aimed at frontier performance.
- V4-Flash: ~284B parameters, ~13B active, smaller and cheaper, meant to be the “fast, efficient” option.
Both support 1M-token context; paper highlights hybrid attention (CSA + HCA), manifold-constrained hyper-connections, and Muon optimizer, plus large (~32T token) pretraining.

Performance vs GPT/Claude/Kimi/GLM

Benchmarks: close to Opus 4.5/4.6 and GPT-5.4; below GPT-5.5 and Opus 4.7 on many metrics.
Their own Chinese announcement says V4-Pro is:
- Better than Sonnet 4.5.
- Near Opus 4.6 without “Thinking.”
- Worse than Opus 4.6 with “Thinking.”
Some users report very strong math and research behavior, especially with “max thinking,” and competitive coding; others say it lags Kimi 2.6 and GLM 5.x in independent evals.
Several comments stress “vibes over benchmarks”: real-world coding and agentic performance diverge from leaderboard scores, and benchmarks like SWE-bench are likely contaminated.

Pricing, hardware, and hosting

OpenRouter pricing:
- Pro: ~$1.74/M input, $3.48/M output.
- Flash: ~$0.14/M input, $0.28/M output.
Many see this as dramatically cheaper than US frontier APIs, especially for Opus‑level quality; some argue big US labs’ “subsidized” narrative is overstated.
On-prem inference for full Pro is extremely heavy (tens of H100s or very high-end consumer GPU clusters).
Flash (≈160 GB mixed FP4/FP8) is seen as plausible on high-end Macs or multi‑GPU rigs; quantization and SSD-streaming MoE tricks are discussed but considered slow and experimental.

Open weights, licensing, and “open source” debate

Strong appreciation that both base and instruct weights are released; DeepSeek is praised for a broad ecosystem of open tooling.
Ongoing argument over calling this “open source” vs “open weights” since training data and full reproducible pipelines are not provided.
Some see open weights as crucial for control, fine-tuning, and non‑rug‑pull stability versus closed SaaS models.

Tooling, coding harnesses, and UX

A major sticking point: no first-party “Claude Code–level” harness; stickiness may lag closed models.
However, DeepSeek provides explicit guides to integrate with Claude Code; users report it works surprisingly well there and in other agents (Pi, OpenCode, Zed, etc.).
Docs are widely praised as clear and developer-focused.

Geopolitics, trust, and chip ecosystem

Big meta-thread on whether to trust Chinese vs US providers:
- Some fear Chinese state access; others feel more threatened by US surveillance and policy.
Noted that inference is already running on Huawei Ascend NPUs; DeepSeek claims prices will fall further once Ascend 950 supernodes scale.
Many see this as a significant challenge to Nvidia’s dominance and to US AI monopolies.

Meta: pace, burnout, and model churn

Users express burnout from rapid frontier releases and constant “better than Opus/GPT” claims.
Several say intelligence is now commoditized above a certain level; workflows, harnesses, reliability, and control matter more than marginal benchmark gains.

Related topics