Why current LLM costs are not sustainable

Cost structure & sustainability

  • Many argue current frontier LLM economics are shaky: labs lose tens of billions while charging per-token rates that must also cover training, infra, staff, and marketing.
  • Inference itself may already be profitable, but overall businesses are not; disagreement on whether this means subscriptions are “heavily subsidized” or whether enterprise API tokens simply carry big margins.
  • Some think valuations assume unrealistically smooth progress and demand through 2030; others note inference revenue already exceeds inference cost for at least one major lab.

Subscriptions, tokens & usage patterns

  • Several anecdotes of individual monthly API-equivalent usage in the thousands of dollars, driven by agents scanning large codebases, long loops, and huge contexts with poor cache utilization.
  • Others say they can’t hit subscription limits even with daily professional use, suggesting that overuse is often an orchestration/“skill issue” rather than necessity.
  • Strong consensus that people overuse top-tier models for trivial tasks because subscriptions hide the true cost.

Cloud vs local & hosting future

  • Split views: some large enterprises insist on running models in their own datacenters for security; others are comfortable outsourcing to hyperscalers and see them as the de facto winners.
  • Many expect an ecosystem mirroring web hosting: bare-metal GPU providers and higher-level “LLM platforms,” with frontier labs competing on quality and integrations rather than raw hosting.
  • Others foresee most AI becoming local/on-device by ~2030, eroding the business of large hosted frontier models except for niche high-end use.

Open models and geopolitics

  • Open-weight Chinese models (DeepSeek, GLM, Qwen) are widely seen as dramatically cheaper and “good enough” for many workloads; some suggest global spend could drop 10× by switching.
  • Counterpoint: concerns about relying on a “hostile superpower” and possible US sanctions that could force US and allied providers to drop such models, pushing users back to domestic labs.

Hardware, optimization & future direction

  • Back-of-envelope costs for self-hosting large models on 8×B200 plus dedicated solar suggest nontrivial per-token costs even at scale.
  • Many believe major savings lie in better routing to smaller models, improved agents/harnesses, and hardware specialization (e.g., “LLM-on-a-chip”), not just cheaper frontier models.
  • Debate continues over whether capabilities are plateauing; some see diminishing returns, others say recent releases and agentic workflows still show rapid progress.