2024-09-10

We're in the brute force phase of AI – once it ends, demand for GPUs will too

Hardware evolution: GPUs, ASICs, and specialization

Many argue current “GPUs” like H100 are already quasi-ASICs: graphics features stripped, heavily optimized for matrix multiplications.
Some expect further specialization: transformer- or even model-specific ASICs, possibly with weights on-chip, enabling huge tokens/sec but less flexibility.
Others warn transformer-specific silicon is risky: architectures (Mamba, RWKV, hybrids) are changing fast; ASIC cycles are too long, favoring GPGPU/TPU/NPU-style flexibility.
Debate over programmability: some say inference (and even training) doesn’t need general programmability; others insist future algorithms will.

Will GPU demand collapse, plateau, or keep rising?

Critics of the headline say “demand ends” is wrong; at most, growth may slow or shift from new to used hardware.
One view: if GPU performance per dollar keeps improving and task demand stays flat, unit sales could fall sharply.
Counterview: demand is not fixed; cheaper compute opens new applications (historical parallel with CPUs and software).

Brute force vs smarter algorithms

Many agree we’re in a brute-force phase: giant models, huge datasets, lots of parallel matmul.
Some think improved algorithms will reduce reliance on massive GPU fleets and could obsolete today’s specialized hardware.
Others see brute force as inherent to ML (hyperparameter sweeps, large search spaces) and expect compute appetite to persist even with better methods.

Use cases and “we’re just getting started”

Optimists claim we’re early: text, images, and audio are “checked”; video, 3D, simulation, planning, reasoning, etc. are still emerging and will require far more compute.
Predicted future uses include: high-fidelity video generation, holodeck-like XR experiences, pervasive small-scale LLMs in products, richer game NPCs, and expanded computer vision workloads.
Skeptics counter that generative quality gains are slowing, many applications are gimmicky, and previous tech hype cycles (crypto, metaverse, self-driving) overpromised.

Economics, efficiency, and induced demand

Strong thread on Jevons paradox: greater efficiency often increases total resource use (example analogies: roads, developer tools, historical CPU gains).
Others stress limits: energy costs, thermals, and opportunity cost mean not all idle compute should be used; some hardware (old GPUs/CPUs) becomes uneconomic to run.
Unclear how these forces net out, but most agree parallel compute remains fundamentally valuable beyond the current LLM boom.

Related topics