2025-01-28

DeepSeek could represent Nvidia CEO Jensen Huang's worst nightmare

Market reaction and perceived irrationality

Several comments note there was little reaction to DeepSeek-V3 or R1 themselves, but a sharp reaction once a consumer-facing app appeared, seen as evidence of shallow, narrative-driven markets.
Some argue this “signals money on the table” for those who paid attention to the underlying tech earlier; others say timing lags are normal (e.g., COVID).

Jevons paradox, efficiency, and Nvidia’s valuation

Jevons paradox is heavily debated:
- One camp: greater efficiency in model training/inference will increase total compute use (and thus GPU demand).
- Another camp: Jevons applies to compute in general, not to Nvidia’s profits or growth specifically; high margins and growth expectations are what got repriced.
Some stress that even modest downward revisions in long-term growth assumptions can justify large market-cap drops.

Is DeepSeek bad or good for Nvidia?

Bearish arguments:
- More efficient training means fewer GPUs needed to hit a given capability; hyperscalers may slow new cluster purchases and rely on existing capacity.
- Long term, breakthroughs could move more workloads to cheaper or non-GPU hardware; Nvidia’s “training monopoly” may soften.
Bullish/neutral arguments:
- R1-style “thinking” models increase inference compute per query.
- Lower training cost democratizes model-building, inducing many more models and thus more total compute.
- Nvidia’s moat (CUDA, NVLink, Mellanox/Infiniband ecosystem) is seen as extremely strong; “Nvidia sells clusters and a full stack, not just chips.”

Costs, hardware, and sanctions

DeepSeek-V3 training is widely cited as ~2.8M H800 GPU hours ≈ $5–6M at $2/hr, but multiple comments emphasize this excludes capex, experimentation, RL steps, data generation, and staff.
Back-of-envelope estimates put a 2,048‑H800 cluster into the ~$100–200M range including infrastructure; the “$6M model” narrative is viewed as technically narrow but still notable for showing efficiency.
Export controls to China are discussed: DeepSeek was likely trained on Nvidia cards acquired before/around sanctions; this undercuts the idea that restrictions would block Chinese AI progress.

Democratization and new opportunities

Many see DeepSeek as enabling smaller or mid-stage companies to train competitive domain-specific models instead of paying incumbents’ API “rent.”
Some foresee more on-prem deployments (for legal/privacy reasons), family or small-org “AI stations,” and induced demand for mid-range hardware.

Skills, education, and systems knowledge

A strong thread argues DeepSeek’s success highlights the value of deep systems knowledge (OS, compilers, Mellanox/InfiniBand tuning, scheduling, concurrency) over “glue-code ML.”
US CS programs are criticized for weakening OS/architecture requirements, contrasted with stronger systems pipelines in places like Israel and India.
Concrete resources for leveling up: classic OS/systems courses (e.g., UIUC CS241, Berkeley CS162, MIT 6.1810), then HPC, then ML.

Media hype, bubble concerns, and future direction

Several participants are uneasy with media exaggeration around DeepSeek and the sudden explosion of “instant experts” (e.g., on Jevons paradox).
Some think this is a rational correction of an “obvious bubble”; others think markets still underprice the broader productivity impact of current LLMs.
There is disagreement on whether we are at “the end of brute-force scaling” or at the beginning of “test-time scaling” and algorithmic refinement that will still demand massive compute over time.

Related topics