2026-05-22

DeepSeek makes the V4 Pro price discount permanent

Pricing and Cost Dynamics

Many commenters see DeepSeek V4 Pro and Flash as extraordinarily cheap, citing large workloads (tens of millions of tokens) costing only a few dollars.
Comparisons against other frontier models show orders-of-magnitude lower $/M tokens, especially on output and cache reads.
Some users still find per-token billing more expensive than flat subscriptions (Claude, Codex) when they operate near subscription session limits, especially if caching is misconfigured.
Third-party gateways and routers often charge significantly more than DeepSeek’s own API, changing the economics.

Model Performance and Use Cases

V4 Pro is widely praised as strong for complex coding, large summarization, and reasoning-heavy tasks; often compared to mid‑tier GPT/Claude models.
V4 Flash is favored for speed, cost, and agentic/tool-heavy workflows; many find it “good enough” to maintain codebases or power agents.
Some users report DeepSeek lagging behind top US models on “frontier” tasks; others say Chinese models (DeepSeek, Kimi, MiMO, Qwen) now feel close enough for everyday work.
There are mixed reports: some find V4 mediocre for certain structured tasks (e.g., robust JSON planning) compared to other models.

Caching and Architecture

Commenters highlight DeepSeek’s MLA/DSA architecture reducing KV cache memory 5–13×, enabling long contexts and cheap cache reads.
Cache read pricing (0.8–2% of input cost) and high hit rates (often ~70–80%) make multi-tool agent runs dramatically cheaper than competitors.
Some users learn to “front-load” project context to maximize cache reuse, reporting half‑billion‑token sessions costing only a few dollars.

Tooling, Harnesses, and Integrations

DeepSeek integrates with many coding agents and harnesses (Claude Code, OpenCode, Pi, Zed, Copilot, various proxies/routers).
Several users prefer harness‑agnostic setups to avoid vendor lock‑in, switching models per task via proxies or routers.
V4 Flash and Pro are used through cloud providers (Azure, DeepInfra, EU routers), sometimes trading price for data residency or no‑retention guarantees.

Data Privacy and Security Concerns

Multiple commenters worry about sending sensitive data to a Chinese-hosted service; DeepSeek’s policy explicitly allows using user input for training and stores data in China.
Others argue all cloud LLMs (US and Chinese) are privacy risks, pointing to data retention, law‑enforcement access, and breaches.
Some mitigate by using non‑Chinese hosts, secure enclaves, or running open weights locally; others are unconcerned unless working on strategically sensitive projects.

Censorship, Bias, and Alignment

Users report noticeable political censorship and pro‑China bias in the hosted model (answers aborted or redirected on mild political topics).
The open‑weight “base” reportedly has fewer such issues when self‑hosted.
Some prefer this to what they see as heavy‑handed “woke” alignment in Western models; others find both directions problematic.

Business Viability and Geopolitics

Debate over whether DeepSeek is selling at a loss: some infer this from much higher prices charged by third‑party hosts; others point to efficiency, cheap power, small team, and possible local hardware as explanations.
Speculation that state backing or strategic loss‑leading could be aimed at undercutting US vendors, analogized to EVs or lithium.
Some expect potential US restrictions on Chinese AI services; others question enforceability (VPNs, foreign hosts).
Several see open‑weights Chinese models plus cheap inference as a major shift in global AI competition and user dependence on US labs.

Related topics