2025-01-27

Nvidia’s $589B DeepSeek rout

Market reaction and stock moves

Nvidia and other “AI trade” stocks dropped sharply; ASML also fell, which many see as an overreaction or narrative-driven rather than fundamentals-based.
Some view this as the AI bubble finally deflating or at least a correction of “priced for perfection” valuations; others compare it to dotcom-era volatility where tech was real but timelines and moats were mispriced.
Several comments stress that markets are largely a beauty contest of expectations about expectations, not a clean reflection of real-world AI demand or Nvidia’s current business.

DeepSeek’s claims, verification, and skepticism

DeepSeek reports training a frontier-scale reasoning model for roughly $6M on H800s, with detailed papers and open weights.
Skeptics question whether training cost or hardware access are understated, or whether this is politically motivated PR; some suspect unreported H100 clusters or hidden subsidies.
Others check FLOPs, architecture, tokens, and MFU and argue the numbers basically add up; early replications (including small-scale Berkeley work and live Hugging Face efforts) support genuine efficiency gains, at least for smaller models.
Key nuance: the $6M figure is for V3 pretraining; total R1 cost isn’t fully disclosed, and much of the gain appears to come from architectural and low-level engineering innovations, not magic.

Consequences for Nvidia, GPUs, and data centers

Bear case: if you can match o1‑like performance with ~10–50× less compute, hyperscalers’ mega-capex and Nvidia’s extreme margins look less justifiable; Nvidia’s valuation assumed continued exponential GPU demand and lack of real alternatives.
Bull case: Jevons paradox—cheaper intelligence increases total AI usage, expands the customer base beyond a handful of hyperscalers, and still leaves training and reasoning heavily compute-bound; more efficient techniques can be applied on even larger clusters.
Additional concern: if smaller or non‑US players can do frontier-ish work on older or commodity hardware, Nvidia’s pricing power and “only game in town” narrative weaken, even if unit demand stays high.

Impact on OpenAI/Anthropic and foundation-model economics

Many think the real losers are closed, capital‑intensive labs whose moat was “only we can afford to train frontier models on vast GPU farms.”
Distillation and cheap replication of reasoning models compress prices and erode the “rent-seeking” thesis that justified huge private valuations and projects like Stargate.
The consensus is shifting toward foundation models being fungible and commoditizable; value migrates to interfaces, integration, data ownership, and distribution (e.g., hyperscalers, incumbents like Meta, cloud platforms).

China, export controls, and geopolitics

DeepSeek is widely read as proof that export controls and H800 downgrades did not prevent China from reaching near‑frontier performance and may even have forced more aggressive efficiency work (PTX-level optimizations, bandwidth-aware architectures).
Some argue Chinese AI companies may be using smuggled high-end GPUs; others note the political incentives to under‑report capabilities or to time announcements for maximum geopolitical and market impact.
Several commenters predict growing Chinese capability in GPUs, HBM, and lithography, potentially challenging Nvidia and ASML over a 5‑10 year horizon.

Open models, IP, and legal/ethical side threads

The discussion revisits whether training on copyrighted data is unlawful or fair use, and whether LLMs “contain” verbatim works when they can output scripts on demand.
DeepSeek’s openness (papers + weights) is contrasted with closed US labs; some see it as reviving the older norm of publishing major advances, others as a prestige or geopolitical move.
There is broad agreement that open weights and reproducible recipes make it hard for any one lab to sustain a durable moat purely on model training.

Related topics