DeepSeek could represent Nvidia CEO Jensen Huang's worst nightmare
Market reaction and perceived irrationality
- Several comments note there was little reaction to DeepSeek-V3 or R1 themselves, but a sharp reaction once a consumer-facing app appeared, seen as evidence of shallow, narrative-driven markets.
- Some argue this “signals money on the table” for those who paid attention to the underlying tech earlier; others say timing lags are normal (e.g., COVID).
Jevons paradox, efficiency, and Nvidia’s valuation
- Jevons paradox is heavily debated:
- One camp: greater efficiency in model training/inference will increase total compute use (and thus GPU demand).
- Another camp: Jevons applies to compute in general, not to Nvidia’s profits or growth specifically; high margins and growth expectations are what got repriced.
- Some stress that even modest downward revisions in long-term growth assumptions can justify large market-cap drops.
Is DeepSeek bad or good for Nvidia?
- Bearish arguments:
- More efficient training means fewer GPUs needed to hit a given capability; hyperscalers may slow new cluster purchases and rely on existing capacity.
- Long term, breakthroughs could move more workloads to cheaper or non-GPU hardware; Nvidia’s “training monopoly” may soften.
- Bullish/neutral arguments:
- R1-style “thinking” models increase inference compute per query.
- Lower training cost democratizes model-building, inducing many more models and thus more total compute.
- Nvidia’s moat (CUDA, NVLink, Mellanox/Infiniband ecosystem) is seen as extremely strong; “Nvidia sells clusters and a full stack, not just chips.”
Costs, hardware, and sanctions
- DeepSeek-V3 training is widely cited as ~2.8M H800 GPU hours ≈ $5–6M at $2/hr, but multiple comments emphasize this excludes capex, experimentation, RL steps, data generation, and staff.
- Back-of-envelope estimates put a 2,048‑H800 cluster into the ~$100–200M range including infrastructure; the “$6M model” narrative is viewed as technically narrow but still notable for showing efficiency.
- Export controls to China are discussed: DeepSeek was likely trained on Nvidia cards acquired before/around sanctions; this undercuts the idea that restrictions would block Chinese AI progress.
Democratization and new opportunities
- Many see DeepSeek as enabling smaller or mid-stage companies to train competitive domain-specific models instead of paying incumbents’ API “rent.”
- Some foresee more on-prem deployments (for legal/privacy reasons), family or small-org “AI stations,” and induced demand for mid-range hardware.
Skills, education, and systems knowledge
- A strong thread argues DeepSeek’s success highlights the value of deep systems knowledge (OS, compilers, Mellanox/InfiniBand tuning, scheduling, concurrency) over “glue-code ML.”
- US CS programs are criticized for weakening OS/architecture requirements, contrasted with stronger systems pipelines in places like Israel and India.
- Concrete resources for leveling up: classic OS/systems courses (e.g., UIUC CS241, Berkeley CS162, MIT 6.1810), then HPC, then ML.
Media hype, bubble concerns, and future direction
- Several participants are uneasy with media exaggeration around DeepSeek and the sudden explosion of “instant experts” (e.g., on Jevons paradox).
- Some think this is a rational correction of an “obvious bubble”; others think markets still underprice the broader productivity impact of current LLMs.
- There is disagreement on whether we are at “the end of brute-force scaling” or at the beginning of “test-time scaling” and algorithmic refinement that will still demand massive compute over time.