We're in the brute force phase of AI – once it ends, demand for GPUs will too
Hardware evolution: GPUs, ASICs, and specialization
- Many argue current “GPUs” like H100 are already quasi-ASICs: graphics features stripped, heavily optimized for matrix multiplications.
- Some expect further specialization: transformer- or even model-specific ASICs, possibly with weights on-chip, enabling huge tokens/sec but less flexibility.
- Others warn transformer-specific silicon is risky: architectures (Mamba, RWKV, hybrids) are changing fast; ASIC cycles are too long, favoring GPGPU/TPU/NPU-style flexibility.
- Debate over programmability: some say inference (and even training) doesn’t need general programmability; others insist future algorithms will.
Will GPU demand collapse, plateau, or keep rising?
- Critics of the headline say “demand ends” is wrong; at most, growth may slow or shift from new to used hardware.
- One view: if GPU performance per dollar keeps improving and task demand stays flat, unit sales could fall sharply.
- Counterview: demand is not fixed; cheaper compute opens new applications (historical parallel with CPUs and software).
Brute force vs smarter algorithms
- Many agree we’re in a brute-force phase: giant models, huge datasets, lots of parallel matmul.
- Some think improved algorithms will reduce reliance on massive GPU fleets and could obsolete today’s specialized hardware.
- Others see brute force as inherent to ML (hyperparameter sweeps, large search spaces) and expect compute appetite to persist even with better methods.
Use cases and “we’re just getting started”
- Optimists claim we’re early: text, images, and audio are “checked”; video, 3D, simulation, planning, reasoning, etc. are still emerging and will require far more compute.
- Predicted future uses include: high-fidelity video generation, holodeck-like XR experiences, pervasive small-scale LLMs in products, richer game NPCs, and expanded computer vision workloads.
- Skeptics counter that generative quality gains are slowing, many applications are gimmicky, and previous tech hype cycles (crypto, metaverse, self-driving) overpromised.
Economics, efficiency, and induced demand
- Strong thread on Jevons paradox: greater efficiency often increases total resource use (example analogies: roads, developer tools, historical CPU gains).
- Others stress limits: energy costs, thermals, and opportunity cost mean not all idle compute should be used; some hardware (old GPUs/CPUs) becomes uneconomic to run.
- Unclear how these forces net out, but most agree parallel compute remains fundamentally valuable beyond the current LLM boom.