2025-06-09

LLMs are cheap

Cost, Profitability, and Subsidies

Many argue inference is already cheap and profitable: GPU efficiency has improved dramatically; power per token can be tiny at scale; providers of open‑weight models reportedly enjoy large gross margins.
Others are skeptical: frontier companies report multi‑billion‑dollar losses, spend heavily on GPUs and salaries, and may be shifting costs between COGS/R&D. Some APIs (e.g., high‑end “reasoning” models) are clearly pricey.
Debate over capex vs opex: training is framed as capex (creating an asset: weights) that depreciates; inference is opex. But frequent retraining and rapid obsolescence make “asset” status questionable.
Self‑hosting appears expensive without large‑scale batching; people who tried it find GPU and energy costs high compared to hosted APIs.

Lock‑In, Competition, and Moats

Several commenters note LLM inference APIs are easy to switch: text-in/text-out, similar endpoints, adapters like OpenAI‑compatible APIs, and minimal prompt changes.
Others counter that integration into products, “projects,” and enterprise workflows creates soft switching costs and future room for price hikes—more like cloud services than pure commodities.
Lack of strong moats plus many providers suggests price pressure, but big players still have brand and distribution advantages.

Monetization, Ads, and Future Pricing

Widespread view: current prices are influenced by VC/strategic subsidies; once expansion slows, prices or ad load will rise (Netflix/Uber/dot‑com analogies).
Ads are seen as the obvious path: contextual recommendations inside answers, system‑prompt ad injection, affiliate links, and behavioral targeting based on prompts.
Some see this as “ultimate propaganda” and worry about agents quietly favoring sponsors or omitting non‑paying options; others argue contextual ads can be transparent and aligned with user interests.
On free MAUs (e.g., hundreds of millions for ChatGPT), opinions split: some say an extra $1/year ARPU via ads is trivial; others stress how hard it is to move users from free to even $1.

Comparison with Search and Usage Patterns

Supporters: on a per‑unit basis, mid‑range LLMs are already cheaper than commercial search APIs, especially for simple Q&A, and don’t need crawling/indexing.
Critics: realistic LLM use often involves web grounding/RAG and long iterative contexts, exploding token counts and undermining the “cheap” comparison.
Many point out that search UX is now clogged with SEO spam, cookie walls and ads; LLMs currently give cleaner, faster answers with links, which explains user preference—even if that UX may converge with search once ads appear.

Externalities: Environment and Information Quality

Some warn that focusing only on retail price ignores energy use, water, carbon, and broader ecological costs, as well as IP/copyright issues and labor impacts.
Others counter that LLM energy usage is “reasonable” relative to other digital activities and can be powered by low‑carbon electricity.
There’s concern that LLM‑generated content is degrading the open web, making both search and future LLM training worse—an unaccounted cost in “LLMs are cheap.”

Arms Race, Depreciation, and Sustainability

Commenters note that models depreciate fast: new releases quickly displace old ones, driving continuous expensive R&D and training.
Some doubt any provider can “flip a switch” to profitability soon given hardware scarcity and ongoing model races; others think inference economics are already solid and only training burn needs to stabilize.

Related topics