Why current LLM costs are not sustainable
Cost structure & sustainability
- Many argue current frontier LLM economics are shaky: labs lose tens of billions while charging per-token rates that must also cover training, infra, staff, and marketing.
- Inference itself may already be profitable, but overall businesses are not; disagreement on whether this means subscriptions are “heavily subsidized” or whether enterprise API tokens simply carry big margins.
- Some think valuations assume unrealistically smooth progress and demand through 2030; others note inference revenue already exceeds inference cost for at least one major lab.
Subscriptions, tokens & usage patterns
- Several anecdotes of individual monthly API-equivalent usage in the thousands of dollars, driven by agents scanning large codebases, long loops, and huge contexts with poor cache utilization.
- Others say they can’t hit subscription limits even with daily professional use, suggesting that overuse is often an orchestration/“skill issue” rather than necessity.
- Strong consensus that people overuse top-tier models for trivial tasks because subscriptions hide the true cost.
Cloud vs local & hosting future
- Split views: some large enterprises insist on running models in their own datacenters for security; others are comfortable outsourcing to hyperscalers and see them as the de facto winners.
- Many expect an ecosystem mirroring web hosting: bare-metal GPU providers and higher-level “LLM platforms,” with frontier labs competing on quality and integrations rather than raw hosting.
- Others foresee most AI becoming local/on-device by ~2030, eroding the business of large hosted frontier models except for niche high-end use.
Open models and geopolitics
- Open-weight Chinese models (DeepSeek, GLM, Qwen) are widely seen as dramatically cheaper and “good enough” for many workloads; some suggest global spend could drop 10× by switching.
- Counterpoint: concerns about relying on a “hostile superpower” and possible US sanctions that could force US and allied providers to drop such models, pushing users back to domestic labs.
Hardware, optimization & future direction
- Back-of-envelope costs for self-hosting large models on 8×B200 plus dedicated solar suggest nontrivial per-token costs even at scale.
- Many believe major savings lie in better routing to smaller models, improved agents/harnesses, and hardware specialization (e.g., “LLM-on-a-chip”), not just cheaper frontier models.
- Debate continues over whether capabilities are plateauing; some see diminishing returns, others say recent releases and agentic workflows still show rapid progress.