DeepSeek v4

Release details and model variants

  • DeepSeek-V4 is released as a “preview” with open weights on Hugging Face, not just API access.
  • Two main MoE models:
    • V4-Pro: ~1.6T parameters, ~49B active, aimed at frontier performance.
    • V4-Flash: ~284B parameters, ~13B active, smaller and cheaper, meant to be the “fast, efficient” option.
  • Both support 1M-token context; paper highlights hybrid attention (CSA + HCA), manifold-constrained hyper-connections, and Muon optimizer, plus large (~32T token) pretraining.

Performance vs GPT/Claude/Kimi/GLM

  • Benchmarks: close to Opus 4.5/4.6 and GPT-5.4; below GPT-5.5 and Opus 4.7 on many metrics.
  • Their own Chinese announcement says V4-Pro is:
    • Better than Sonnet 4.5.
    • Near Opus 4.6 without “Thinking.”
    • Worse than Opus 4.6 with “Thinking.”
  • Some users report very strong math and research behavior, especially with “max thinking,” and competitive coding; others say it lags Kimi 2.6 and GLM 5.x in independent evals.
  • Several comments stress “vibes over benchmarks”: real-world coding and agentic performance diverge from leaderboard scores, and benchmarks like SWE-bench are likely contaminated.

Pricing, hardware, and hosting

  • OpenRouter pricing:
    • Pro: ~$1.74/M input, $3.48/M output.
    • Flash: ~$0.14/M input, $0.28/M output.
  • Many see this as dramatically cheaper than US frontier APIs, especially for Opus‑level quality; some argue big US labs’ “subsidized” narrative is overstated.
  • On-prem inference for full Pro is extremely heavy (tens of H100s or very high-end consumer GPU clusters).
  • Flash (≈160 GB mixed FP4/FP8) is seen as plausible on high-end Macs or multi‑GPU rigs; quantization and SSD-streaming MoE tricks are discussed but considered slow and experimental.

Open weights, licensing, and “open source” debate

  • Strong appreciation that both base and instruct weights are released; DeepSeek is praised for a broad ecosystem of open tooling.
  • Ongoing argument over calling this “open source” vs “open weights” since training data and full reproducible pipelines are not provided.
  • Some see open weights as crucial for control, fine-tuning, and non‑rug‑pull stability versus closed SaaS models.

Tooling, coding harnesses, and UX

  • A major sticking point: no first-party “Claude Code–level” harness; stickiness may lag closed models.
  • However, DeepSeek provides explicit guides to integrate with Claude Code; users report it works surprisingly well there and in other agents (Pi, OpenCode, Zed, etc.).
  • Docs are widely praised as clear and developer-focused.

Geopolitics, trust, and chip ecosystem

  • Big meta-thread on whether to trust Chinese vs US providers:
    • Some fear Chinese state access; others feel more threatened by US surveillance and policy.
  • Noted that inference is already running on Huawei Ascend NPUs; DeepSeek claims prices will fall further once Ascend 950 supernodes scale.
  • Many see this as a significant challenge to Nvidia’s dominance and to US AI monopolies.

Meta: pace, burnout, and model churn

  • Users express burnout from rapid frontier releases and constant “better than Opus/GPT” claims.
  • Several say intelligence is now commoditized above a certain level; workflows, harnesses, reliability, and control matter more than marginal benchmark gains.