2026-04-16

The beginning of scarcity in AI

Hardware and energy bottlenecks

Many argue the bottleneck is manufacturing capacity: especially EUV lithography tools and the complex fab supply chain; scaling is slow and risky due to past boom–bust cycles.
Others point to power limits: turbine manufacturing and grid constraints make scaling datacenters difficult.
There is debate over whether ASML-like tooling is the global bottleneck, versus the difficulty and cost of building full fabs and supporting infrastructure.
Some note energy constraints are asymmetric: the US is seen as grid-limited; China as compute-limited but rapidly scaling wind/solar.

Is compute scarcity real or artificial?

One camp sees genuine, multi‑year compute scarcity with GPU prices rising and high utilization.
Another sees “artificial scarcity,” driven by hype, subsidized pricing, and investors chasing a bubble that may end in oversupply and cheap compute.
There is disagreement over whether current AI demand is durable or more like past tech bubbles and crypto GPU spikes.

Model architectures, ASICs, and efficiency

Transformer O(n²) scaling is seen as a fundamental limit; some expect new architectures (e.g., state-space hybrids) to reduce compute needs.
Skepticism that ASIC inference will dominate soon: by the time an ASIC ships, models may be several generations ahead. ASICs likely make sense only once architectures stabilize.

Local and open models

Many stress that open-weight models lag frontier systems by ~6–12 months but are already “good enough” for many business tasks.
Local inference is seen as a way to bypass cloud compute scarcity and future price hikes, at the cost of weaker models and hardware constraints.
Others counter that local models still lack nuance and remain closer to older frontier performance (e.g., GPT‑3.5).

Economics, pricing, and dependency risk

Discussion of labs “burning dollars to buy oranges at $1 and sell at $0.50” to gain market share, with hopes that compute prices or margins improve later.
Strong concern about depending on proprietary LLM APIs: AI-first products may face rising COGS and forced price hikes if token prices increase.
Some predict a dot‑com–style cycle: massive overbuild of AI infra, followed by bankruptcies and cheap surplus compute; others think high margins and demand might persist.
Valuations of frontier labs are widely viewed as stretched; profitability and true margins are seen as opaque and possibly overstated.

Innovation under constraints

Scarcity is expected to drive:
- Better “harnesses” (wrappers, tools, and orchestration layers around models).
- Smaller, specialized models tailored to specific tasks and hardware constraints.
Examples from China and constrained teams show that limited GPUs have already led to influential efficiency techniques.
Some argue the real bottleneck isn’t compute but robust evaluation: without good measurement, cheaper or better models just let you make mistakes faster.

Broader outlook and skepticism

Several commenters doubt AI will deliver the transformative productivity needed to justify current spend.
Others expect that as mid‑tier models rapidly improve, many use cases will move off frontier APIs to cheaper local or open alternatives.
Unclear how long the current “scarcity era” will last; many expect a familiar boom‑bust pattern, but timing and magnitude remain uncertain.

Related topics