The beginning of scarcity in AI

Hardware and energy bottlenecks

  • Many argue the bottleneck is manufacturing capacity: especially EUV lithography tools and the complex fab supply chain; scaling is slow and risky due to past boom–bust cycles.
  • Others point to power limits: turbine manufacturing and grid constraints make scaling datacenters difficult.
  • There is debate over whether ASML-like tooling is the global bottleneck, versus the difficulty and cost of building full fabs and supporting infrastructure.
  • Some note energy constraints are asymmetric: the US is seen as grid-limited; China as compute-limited but rapidly scaling wind/solar.

Is compute scarcity real or artificial?

  • One camp sees genuine, multi‑year compute scarcity with GPU prices rising and high utilization.
  • Another sees “artificial scarcity,” driven by hype, subsidized pricing, and investors chasing a bubble that may end in oversupply and cheap compute.
  • There is disagreement over whether current AI demand is durable or more like past tech bubbles and crypto GPU spikes.

Model architectures, ASICs, and efficiency

  • Transformer O(n²) scaling is seen as a fundamental limit; some expect new architectures (e.g., state-space hybrids) to reduce compute needs.
  • Skepticism that ASIC inference will dominate soon: by the time an ASIC ships, models may be several generations ahead. ASICs likely make sense only once architectures stabilize.

Local and open models

  • Many stress that open-weight models lag frontier systems by ~6–12 months but are already “good enough” for many business tasks.
  • Local inference is seen as a way to bypass cloud compute scarcity and future price hikes, at the cost of weaker models and hardware constraints.
  • Others counter that local models still lack nuance and remain closer to older frontier performance (e.g., GPT‑3.5).

Economics, pricing, and dependency risk

  • Discussion of labs “burning dollars to buy oranges at $1 and sell at $0.50” to gain market share, with hopes that compute prices or margins improve later.
  • Strong concern about depending on proprietary LLM APIs: AI-first products may face rising COGS and forced price hikes if token prices increase.
  • Some predict a dot‑com–style cycle: massive overbuild of AI infra, followed by bankruptcies and cheap surplus compute; others think high margins and demand might persist.
  • Valuations of frontier labs are widely viewed as stretched; profitability and true margins are seen as opaque and possibly overstated.

Innovation under constraints

  • Scarcity is expected to drive:
    • Better “harnesses” (wrappers, tools, and orchestration layers around models).
    • Smaller, specialized models tailored to specific tasks and hardware constraints.
  • Examples from China and constrained teams show that limited GPUs have already led to influential efficiency techniques.
  • Some argue the real bottleneck isn’t compute but robust evaluation: without good measurement, cheaper or better models just let you make mistakes faster.

Broader outlook and skepticism

  • Several commenters doubt AI will deliver the transformative productivity needed to justify current spend.
  • Others expect that as mid‑tier models rapidly improve, many use cases will move off frontier APIs to cheaper local or open alternatives.
  • Unclear how long the current “scarcity era” will last; many expect a familiar boom‑bust pattern, but timing and magnitude remain uncertain.