LLMs are cheap

Cost, Profitability, and Subsidies

  • Many argue inference is already cheap and profitable: GPU efficiency has improved dramatically; power per token can be tiny at scale; providers of open‑weight models reportedly enjoy large gross margins.
  • Others are skeptical: frontier companies report multi‑billion‑dollar losses, spend heavily on GPUs and salaries, and may be shifting costs between COGS/R&D. Some APIs (e.g., high‑end “reasoning” models) are clearly pricey.
  • Debate over capex vs opex: training is framed as capex (creating an asset: weights) that depreciates; inference is opex. But frequent retraining and rapid obsolescence make “asset” status questionable.
  • Self‑hosting appears expensive without large‑scale batching; people who tried it find GPU and energy costs high compared to hosted APIs.

Lock‑In, Competition, and Moats

  • Several commenters note LLM inference APIs are easy to switch: text-in/text-out, similar endpoints, adapters like OpenAI‑compatible APIs, and minimal prompt changes.
  • Others counter that integration into products, “projects,” and enterprise workflows creates soft switching costs and future room for price hikes—more like cloud services than pure commodities.
  • Lack of strong moats plus many providers suggests price pressure, but big players still have brand and distribution advantages.

Monetization, Ads, and Future Pricing

  • Widespread view: current prices are influenced by VC/strategic subsidies; once expansion slows, prices or ad load will rise (Netflix/Uber/dot‑com analogies).
  • Ads are seen as the obvious path: contextual recommendations inside answers, system‑prompt ad injection, affiliate links, and behavioral targeting based on prompts.
  • Some see this as “ultimate propaganda” and worry about agents quietly favoring sponsors or omitting non‑paying options; others argue contextual ads can be transparent and aligned with user interests.
  • On free MAUs (e.g., hundreds of millions for ChatGPT), opinions split: some say an extra $1/year ARPU via ads is trivial; others stress how hard it is to move users from free to even $1.

Comparison with Search and Usage Patterns

  • Supporters: on a per‑unit basis, mid‑range LLMs are already cheaper than commercial search APIs, especially for simple Q&A, and don’t need crawling/indexing.
  • Critics: realistic LLM use often involves web grounding/RAG and long iterative contexts, exploding token counts and undermining the “cheap” comparison.
  • Many point out that search UX is now clogged with SEO spam, cookie walls and ads; LLMs currently give cleaner, faster answers with links, which explains user preference—even if that UX may converge with search once ads appear.

Externalities: Environment and Information Quality

  • Some warn that focusing only on retail price ignores energy use, water, carbon, and broader ecological costs, as well as IP/copyright issues and labor impacts.
  • Others counter that LLM energy usage is “reasonable” relative to other digital activities and can be powered by low‑carbon electricity.
  • There’s concern that LLM‑generated content is degrading the open web, making both search and future LLM training worse—an unaccounted cost in “LLMs are cheap.”

Arms Race, Depreciation, and Sustainability

  • Commenters note that models depreciate fast: new releases quickly displace old ones, driving continuous expensive R&D and training.
  • Some doubt any provider can “flip a switch” to profitability soon given hardware scarcity and ongoing model races; others think inference economics are already solid and only training burn needs to stabilize.