2026-06-26

Why current LLM costs are not sustainable

Cost structure & sustainability

Many argue current frontier LLM economics are shaky: labs lose tens of billions while charging per-token rates that must also cover training, infra, staff, and marketing.
Inference itself may already be profitable, but overall businesses are not; disagreement on whether this means subscriptions are “heavily subsidized” or whether enterprise API tokens simply carry big margins.
Some think valuations assume unrealistically smooth progress and demand through 2030; others note inference revenue already exceeds inference cost for at least one major lab.

Subscriptions, tokens & usage patterns

Several anecdotes of individual monthly API-equivalent usage in the thousands of dollars, driven by agents scanning large codebases, long loops, and huge contexts with poor cache utilization.
Others say they can’t hit subscription limits even with daily professional use, suggesting that overuse is often an orchestration/“skill issue” rather than necessity.
Strong consensus that people overuse top-tier models for trivial tasks because subscriptions hide the true cost.

Cloud vs local & hosting future

Split views: some large enterprises insist on running models in their own datacenters for security; others are comfortable outsourcing to hyperscalers and see them as the de facto winners.
Many expect an ecosystem mirroring web hosting: bare-metal GPU providers and higher-level “LLM platforms,” with frontier labs competing on quality and integrations rather than raw hosting.
Others foresee most AI becoming local/on-device by ~2030, eroding the business of large hosted frontier models except for niche high-end use.

Open models and geopolitics

Open-weight Chinese models (DeepSeek, GLM, Qwen) are widely seen as dramatically cheaper and “good enough” for many workloads; some suggest global spend could drop 10× by switching.
Counterpoint: concerns about relying on a “hostile superpower” and possible US sanctions that could force US and allied providers to drop such models, pushing users back to domestic labs.

Hardware, optimization & future direction

Back-of-envelope costs for self-hosting large models on 8×B200 plus dedicated solar suggest nontrivial per-token costs even at scale.
Many believe major savings lie in better routing to smaller models, improved agents/harnesses, and hardware specialization (e.g., “LLM-on-a-chip”), not just cheaper frontier models.
Debate continues over whether capabilities are plateauing; some see diminishing returns, others say recent releases and agentic workflows still show rapid progress.

Related topics