DeepSeek V4 – almost on the frontier
Perceived Quality and Use Cases
- Many report V4 Pro (and Flash) as “good enough” or comparable to Claude Opus 4.6 / GPT‑5.4 / Sonnet‑class for day‑to‑day coding, refactors, and prototyping.
- Some users successfully ran deep analyses of medium–large TypeScript or backend codebases for a few cents, saying they’d previously spent dollars to tens of dollars on frontier models for similar work.
- Others found V4 Pro clearly below GPT‑5.5 / Opus‑max for hard planning, design critique, or complex UX work, calling it more “Sonnet‑level” than frontier.
- Flash is often preferred for cheap, “stupid or speculative” tasks (summaries, stylistic cleanup, brute‑force refactors).
Pricing, Subsidies, and Token Efficiency
- Pricing is a major attraction: reports of ~150–200M tokens per $100 at list price, or hundreds of millions for a few dollars under current discounts.
- Several note the 75% promotional discount on V4 Pro; others emphasize it’s still cheap at full price, especially versus Anthropic/OpenAI subscriptions.
- Skeptics point out that V4 Pro and K2.6 often use many more “thinking” tokens than frontier models, so effective cost per solved task may be closer than raw token prices suggest.
Tooling, Harnesses, and Providers
- Used through DeepSeek’s own API, OpenRouter, Ollama Cloud, Bedrock, Azure AI, Tinfoil (enclave), Claude Code-compatible harnesses, VS Code integrations, pi.dev, opencode, and custom agents.
- Some harnesses mis-handle tool calling or expose raw “thinking traces,” which can look like the model is having a meltdown but are likely internal self‑corrections.
- High cache hit rates on the official API (claimed >99% in long sessions) can drastically reduce cost when working in one codebase.
Privacy, Data Use, and Open-Weights Tradeoffs
- Strong concern that the official DeepSeek API does not guarantee non‑training on user data, even for paying users.
- Some are fine with this as a “trade” for open weights and low prices; others prefer US/EU hosts with stricter legal regimes or confidential‑computing services.
- Open‑weights are praised for: (a) enabling alternative providers that “don’t phone home,” (b) allowing local/self‑hosted use, and (c) reducing lock‑in risk.
Alignment, Censorship, and Safety Filters
- Several contrast DeepSeek’s relative permissiveness (e.g., reverse engineering assistance) with OpenAI/Anthropic, where users report refusals and even account warnings.
- Others note censorship exists on all sides (e.g., Tiananmen‑type queries), but open weights allow “uncensoring” or moving to other hosts.
Technical and Hardware Considerations
- V4 Pro reportedly uses new attention/KV‑cache tricks (HCA, mCH) for long context, drastically cutting FLOPs and cache needs relative to earlier DeepSeek versions.
- Flash can run locally with large RAM / multi‑GPU setups; one user runs 1M‑token context fully in GPU memory and sees >2× speedup vs V3.2.
- Running Pro on CPU alone is seen as possible but very slow; better suited to unattended/overnight runs.
Benchmarks, Competition, and Geopolitics
- The “pelican on a bike” test is widely dismissed as overfitted and uninformative; calls for more meaningful evals.
- Mixed views on how V4 compares to Kimi K2.6 and GLM‑5.1: some say Kimi/GLM outperform for coding; others see V4/Flash winning on specific tasks (e.g., spatial reasoning).
- Discussion touches on training sources (alleged distillation from US models), Nvidia embargoes, and Chinese hardware (Huawei) as strategic backdrop, but details remain unclear.