We're in the brute force phase of AI – once it ends, demand for GPUs will too

Hardware evolution: GPUs, ASICs, and specialization

  • Many argue current “GPUs” like H100 are already quasi-ASICs: graphics features stripped, heavily optimized for matrix multiplications.
  • Some expect further specialization: transformer- or even model-specific ASICs, possibly with weights on-chip, enabling huge tokens/sec but less flexibility.
  • Others warn transformer-specific silicon is risky: architectures (Mamba, RWKV, hybrids) are changing fast; ASIC cycles are too long, favoring GPGPU/TPU/NPU-style flexibility.
  • Debate over programmability: some say inference (and even training) doesn’t need general programmability; others insist future algorithms will.

Will GPU demand collapse, plateau, or keep rising?

  • Critics of the headline say “demand ends” is wrong; at most, growth may slow or shift from new to used hardware.
  • One view: if GPU performance per dollar keeps improving and task demand stays flat, unit sales could fall sharply.
  • Counterview: demand is not fixed; cheaper compute opens new applications (historical parallel with CPUs and software).

Brute force vs smarter algorithms

  • Many agree we’re in a brute-force phase: giant models, huge datasets, lots of parallel matmul.
  • Some think improved algorithms will reduce reliance on massive GPU fleets and could obsolete today’s specialized hardware.
  • Others see brute force as inherent to ML (hyperparameter sweeps, large search spaces) and expect compute appetite to persist even with better methods.

Use cases and “we’re just getting started”

  • Optimists claim we’re early: text, images, and audio are “checked”; video, 3D, simulation, planning, reasoning, etc. are still emerging and will require far more compute.
  • Predicted future uses include: high-fidelity video generation, holodeck-like XR experiences, pervasive small-scale LLMs in products, richer game NPCs, and expanded computer vision workloads.
  • Skeptics counter that generative quality gains are slowing, many applications are gimmicky, and previous tech hype cycles (crypto, metaverse, self-driving) overpromised.

Economics, efficiency, and induced demand

  • Strong thread on Jevons paradox: greater efficiency often increases total resource use (example analogies: roads, developer tools, historical CPU gains).
  • Others stress limits: energy costs, thermals, and opportunity cost mean not all idle compute should be used; some hardware (old GPUs/CPUs) becomes uneconomic to run.
  • Unclear how these forces net out, but most agree parallel compute remains fundamentally valuable beyond the current LLM boom.