DeepSeek makes the V4 Pro price discount permanent

Pricing and Cost Dynamics

  • Many commenters see DeepSeek V4 Pro and Flash as extraordinarily cheap, citing large workloads (tens of millions of tokens) costing only a few dollars.
  • Comparisons against other frontier models show orders-of-magnitude lower $/M tokens, especially on output and cache reads.
  • Some users still find per-token billing more expensive than flat subscriptions (Claude, Codex) when they operate near subscription session limits, especially if caching is misconfigured.
  • Third-party gateways and routers often charge significantly more than DeepSeek’s own API, changing the economics.

Model Performance and Use Cases

  • V4 Pro is widely praised as strong for complex coding, large summarization, and reasoning-heavy tasks; often compared to mid‑tier GPT/Claude models.
  • V4 Flash is favored for speed, cost, and agentic/tool-heavy workflows; many find it “good enough” to maintain codebases or power agents.
  • Some users report DeepSeek lagging behind top US models on “frontier” tasks; others say Chinese models (DeepSeek, Kimi, MiMO, Qwen) now feel close enough for everyday work.
  • There are mixed reports: some find V4 mediocre for certain structured tasks (e.g., robust JSON planning) compared to other models.

Caching and Architecture

  • Commenters highlight DeepSeek’s MLA/DSA architecture reducing KV cache memory 5–13×, enabling long contexts and cheap cache reads.
  • Cache read pricing (0.8–2% of input cost) and high hit rates (often ~70–80%) make multi-tool agent runs dramatically cheaper than competitors.
  • Some users learn to “front-load” project context to maximize cache reuse, reporting half‑billion‑token sessions costing only a few dollars.

Tooling, Harnesses, and Integrations

  • DeepSeek integrates with many coding agents and harnesses (Claude Code, OpenCode, Pi, Zed, Copilot, various proxies/routers).
  • Several users prefer harness‑agnostic setups to avoid vendor lock‑in, switching models per task via proxies or routers.
  • V4 Flash and Pro are used through cloud providers (Azure, DeepInfra, EU routers), sometimes trading price for data residency or no‑retention guarantees.

Data Privacy and Security Concerns

  • Multiple commenters worry about sending sensitive data to a Chinese-hosted service; DeepSeek’s policy explicitly allows using user input for training and stores data in China.
  • Others argue all cloud LLMs (US and Chinese) are privacy risks, pointing to data retention, law‑enforcement access, and breaches.
  • Some mitigate by using non‑Chinese hosts, secure enclaves, or running open weights locally; others are unconcerned unless working on strategically sensitive projects.

Censorship, Bias, and Alignment

  • Users report noticeable political censorship and pro‑China bias in the hosted model (answers aborted or redirected on mild political topics).
  • The open‑weight “base” reportedly has fewer such issues when self‑hosted.
  • Some prefer this to what they see as heavy‑handed “woke” alignment in Western models; others find both directions problematic.

Business Viability and Geopolitics

  • Debate over whether DeepSeek is selling at a loss: some infer this from much higher prices charged by third‑party hosts; others point to efficiency, cheap power, small team, and possible local hardware as explanations.
  • Speculation that state backing or strategic loss‑leading could be aimed at undercutting US vendors, analogized to EVs or lithium.
  • Some expect potential US restrictions on Chinese AI services; others question enforceability (VPNs, foreign hosts).
  • Several see open‑weights Chinese models plus cheap inference as a major shift in global AI competition and user dependence on US labs.