DeepSeek could represent Nvidia CEO Jensen Huang's worst nightmare

Market reaction and perceived irrationality

  • Several comments note there was little reaction to DeepSeek-V3 or R1 themselves, but a sharp reaction once a consumer-facing app appeared, seen as evidence of shallow, narrative-driven markets.
  • Some argue this “signals money on the table” for those who paid attention to the underlying tech earlier; others say timing lags are normal (e.g., COVID).

Jevons paradox, efficiency, and Nvidia’s valuation

  • Jevons paradox is heavily debated:
    • One camp: greater efficiency in model training/inference will increase total compute use (and thus GPU demand).
    • Another camp: Jevons applies to compute in general, not to Nvidia’s profits or growth specifically; high margins and growth expectations are what got repriced.
  • Some stress that even modest downward revisions in long-term growth assumptions can justify large market-cap drops.

Is DeepSeek bad or good for Nvidia?

  • Bearish arguments:
    • More efficient training means fewer GPUs needed to hit a given capability; hyperscalers may slow new cluster purchases and rely on existing capacity.
    • Long term, breakthroughs could move more workloads to cheaper or non-GPU hardware; Nvidia’s “training monopoly” may soften.
  • Bullish/neutral arguments:
    • R1-style “thinking” models increase inference compute per query.
    • Lower training cost democratizes model-building, inducing many more models and thus more total compute.
    • Nvidia’s moat (CUDA, NVLink, Mellanox/Infiniband ecosystem) is seen as extremely strong; “Nvidia sells clusters and a full stack, not just chips.”

Costs, hardware, and sanctions

  • DeepSeek-V3 training is widely cited as ~2.8M H800 GPU hours ≈ $5–6M at $2/hr, but multiple comments emphasize this excludes capex, experimentation, RL steps, data generation, and staff.
  • Back-of-envelope estimates put a 2,048‑H800 cluster into the ~$100–200M range including infrastructure; the “$6M model” narrative is viewed as technically narrow but still notable for showing efficiency.
  • Export controls to China are discussed: DeepSeek was likely trained on Nvidia cards acquired before/around sanctions; this undercuts the idea that restrictions would block Chinese AI progress.

Democratization and new opportunities

  • Many see DeepSeek as enabling smaller or mid-stage companies to train competitive domain-specific models instead of paying incumbents’ API “rent.”
  • Some foresee more on-prem deployments (for legal/privacy reasons), family or small-org “AI stations,” and induced demand for mid-range hardware.

Skills, education, and systems knowledge

  • A strong thread argues DeepSeek’s success highlights the value of deep systems knowledge (OS, compilers, Mellanox/InfiniBand tuning, scheduling, concurrency) over “glue-code ML.”
  • US CS programs are criticized for weakening OS/architecture requirements, contrasted with stronger systems pipelines in places like Israel and India.
  • Concrete resources for leveling up: classic OS/systems courses (e.g., UIUC CS241, Berkeley CS162, MIT 6.1810), then HPC, then ML.

Media hype, bubble concerns, and future direction

  • Several participants are uneasy with media exaggeration around DeepSeek and the sudden explosion of “instant experts” (e.g., on Jevons paradox).
  • Some think this is a rational correction of an “obvious bubble”; others think markets still underprice the broader productivity impact of current LLMs.
  • There is disagreement on whether we are at “the end of brute-force scaling” or at the beginning of “test-time scaling” and algorithmic refinement that will still demand massive compute over time.