DeepSeek makes the V4 Pro price discount permanent
Pricing and Cost Dynamics
- Many commenters see DeepSeek V4 Pro and Flash as extraordinarily cheap, citing large workloads (tens of millions of tokens) costing only a few dollars.
- Comparisons against other frontier models show orders-of-magnitude lower $/M tokens, especially on output and cache reads.
- Some users still find per-token billing more expensive than flat subscriptions (Claude, Codex) when they operate near subscription session limits, especially if caching is misconfigured.
- Third-party gateways and routers often charge significantly more than DeepSeek’s own API, changing the economics.
Model Performance and Use Cases
- V4 Pro is widely praised as strong for complex coding, large summarization, and reasoning-heavy tasks; often compared to mid‑tier GPT/Claude models.
- V4 Flash is favored for speed, cost, and agentic/tool-heavy workflows; many find it “good enough” to maintain codebases or power agents.
- Some users report DeepSeek lagging behind top US models on “frontier” tasks; others say Chinese models (DeepSeek, Kimi, MiMO, Qwen) now feel close enough for everyday work.
- There are mixed reports: some find V4 mediocre for certain structured tasks (e.g., robust JSON planning) compared to other models.
Caching and Architecture
- Commenters highlight DeepSeek’s MLA/DSA architecture reducing KV cache memory 5–13×, enabling long contexts and cheap cache reads.
- Cache read pricing (0.8–2% of input cost) and high hit rates (often ~70–80%) make multi-tool agent runs dramatically cheaper than competitors.
- Some users learn to “front-load” project context to maximize cache reuse, reporting half‑billion‑token sessions costing only a few dollars.
Tooling, Harnesses, and Integrations
- DeepSeek integrates with many coding agents and harnesses (Claude Code, OpenCode, Pi, Zed, Copilot, various proxies/routers).
- Several users prefer harness‑agnostic setups to avoid vendor lock‑in, switching models per task via proxies or routers.
- V4 Flash and Pro are used through cloud providers (Azure, DeepInfra, EU routers), sometimes trading price for data residency or no‑retention guarantees.
Data Privacy and Security Concerns
- Multiple commenters worry about sending sensitive data to a Chinese-hosted service; DeepSeek’s policy explicitly allows using user input for training and stores data in China.
- Others argue all cloud LLMs (US and Chinese) are privacy risks, pointing to data retention, law‑enforcement access, and breaches.
- Some mitigate by using non‑Chinese hosts, secure enclaves, or running open weights locally; others are unconcerned unless working on strategically sensitive projects.
Censorship, Bias, and Alignment
- Users report noticeable political censorship and pro‑China bias in the hosted model (answers aborted or redirected on mild political topics).
- The open‑weight “base” reportedly has fewer such issues when self‑hosted.
- Some prefer this to what they see as heavy‑handed “woke” alignment in Western models; others find both directions problematic.
Business Viability and Geopolitics
- Debate over whether DeepSeek is selling at a loss: some infer this from much higher prices charged by third‑party hosts; others point to efficiency, cheap power, small team, and possible local hardware as explanations.
- Speculation that state backing or strategic loss‑leading could be aimed at undercutting US vendors, analogized to EVs or lithium.
- Some expect potential US restrictions on Chinese AI services; others question enforceability (VPNs, foreign hosts).
- Several see open‑weights Chinese models plus cheap inference as a major shift in global AI competition and user dependence on US labs.