Nvidia’s $589B DeepSeek rout
Market reaction and stock moves
- Nvidia and other “AI trade” stocks dropped sharply; ASML also fell, which many see as an overreaction or narrative-driven rather than fundamentals-based.
- Some view this as the AI bubble finally deflating or at least a correction of “priced for perfection” valuations; others compare it to dotcom-era volatility where tech was real but timelines and moats were mispriced.
- Several comments stress that markets are largely a beauty contest of expectations about expectations, not a clean reflection of real-world AI demand or Nvidia’s current business.
DeepSeek’s claims, verification, and skepticism
- DeepSeek reports training a frontier-scale reasoning model for roughly $6M on H800s, with detailed papers and open weights.
- Skeptics question whether training cost or hardware access are understated, or whether this is politically motivated PR; some suspect unreported H100 clusters or hidden subsidies.
- Others check FLOPs, architecture, tokens, and MFU and argue the numbers basically add up; early replications (including small-scale Berkeley work and live Hugging Face efforts) support genuine efficiency gains, at least for smaller models.
- Key nuance: the $6M figure is for V3 pretraining; total R1 cost isn’t fully disclosed, and much of the gain appears to come from architectural and low-level engineering innovations, not magic.
Consequences for Nvidia, GPUs, and data centers
- Bear case: if you can match o1‑like performance with ~10–50× less compute, hyperscalers’ mega-capex and Nvidia’s extreme margins look less justifiable; Nvidia’s valuation assumed continued exponential GPU demand and lack of real alternatives.
- Bull case: Jevons paradox—cheaper intelligence increases total AI usage, expands the customer base beyond a handful of hyperscalers, and still leaves training and reasoning heavily compute-bound; more efficient techniques can be applied on even larger clusters.
- Additional concern: if smaller or non‑US players can do frontier-ish work on older or commodity hardware, Nvidia’s pricing power and “only game in town” narrative weaken, even if unit demand stays high.
Impact on OpenAI/Anthropic and foundation-model economics
- Many think the real losers are closed, capital‑intensive labs whose moat was “only we can afford to train frontier models on vast GPU farms.”
- Distillation and cheap replication of reasoning models compress prices and erode the “rent-seeking” thesis that justified huge private valuations and projects like Stargate.
- The consensus is shifting toward foundation models being fungible and commoditizable; value migrates to interfaces, integration, data ownership, and distribution (e.g., hyperscalers, incumbents like Meta, cloud platforms).
China, export controls, and geopolitics
- DeepSeek is widely read as proof that export controls and H800 downgrades did not prevent China from reaching near‑frontier performance and may even have forced more aggressive efficiency work (PTX-level optimizations, bandwidth-aware architectures).
- Some argue Chinese AI companies may be using smuggled high-end GPUs; others note the political incentives to under‑report capabilities or to time announcements for maximum geopolitical and market impact.
- Several commenters predict growing Chinese capability in GPUs, HBM, and lithography, potentially challenging Nvidia and ASML over a 5‑10 year horizon.
Open models, IP, and legal/ethical side threads
- The discussion revisits whether training on copyrighted data is unlawful or fair use, and whether LLMs “contain” verbatim works when they can output scripts on demand.
- DeepSeek’s openness (papers + weights) is contrasted with closed US labs; some see it as reviving the older norm of publishing major advances, others as a prestige or geopolitical move.
- There is broad agreement that open weights and reproducible recipes make it hard for any one lab to sustain a durable moat purely on model training.