73% of AI startups are just prompt engineering

Article Credibility and Methodology

  • Many commenters doubt the piece’s core claim (e.g., “73%”) and even whether the investigation happened as described.
  • Main objection: the author only appears able to see frontend network traffic (via DevTools / Playwright), not calls from a startup’s backend to OpenAI/Anthropic/etc., so the numbers look extrapolated or invented.
  • People question how the author could infer internal RAG architectures, Redis usage, or backend latency patterns, and note that only a minority of companies are naive enough to expose API keys in frontend code.
  • Several call the article “LLM slop,” note the writing style and missing LinkedIn profile, and flag it as likely AI-generated or otherwise untrustworthy, especially given the paywall/PII prompt and lack of released code or data.

Wrappers, Moats, and Platform Risk

  • Broad agreement that many “AI startups” are thin wrappers or workflows around commodity LLM APIs—analogized to CRUD apps, DB connectors, or 90s HTML startups.
  • Core concern: no defensible moat if the main value is a prompt on a third‑party model; hyperscalers can replicate features and undercut them.
  • Others argue this layering is how software always works: almost everyone builds on someone else’s “kingdom” (LLMs, clouds, GPUs, fabs).

Prompt Engineering: How Trivial Is It?

  • One camp: prompt engineering isn’t “engineering” but trial-and-error copywriting/SEO; calling it engineering inflates its importance.
  • Another camp: serious systems require substantial scaffolding—data selection and ETL, RAG, hybrid search, rerankers, eval pipelines, moderation, tool usage, and constant tuning—taking weeks of focused work even on open models.
  • Debate over “prompt is code” vs “prompt is specification”; some see future directions where prompts act as reliably executable specs, others call this a pipe dream given natural language ambiguity.

Models, Specialization, and Lifecycle

  • Many say it’s rational to start with GPT/Claude/etc. to prove demand, then move to smaller, specialized models for cost and control.
  • Counterpoint: fine‑tuning on today’s open models risks being stranded when new base models arrive; the value may lie more in data, workflows, and evals than custom weights.
  • Discussion on specialized vs general models: parallels drawn to CPUs vs GPUs/microcontrollers; expectation that complex products will mix a general model with task-specific ones.

Economics, Bubble, and VC Behavior

  • Some see a bubble specifically in wrapper startups: unprofitable infra, token‑price exposure, dependence on someone else’s unprofitable models, and “smash‑and‑grab” strategies.
  • Others argue the real bubble is further down the stack (massive data centers, fabs, power), not in relatively cheap startups.
  • Commenters note VCs may fund many shallow “ecosystem” products to manufacture traction for flagship model providers.

Product Value and “Real” AI Applications

  • Frustration that most “AI apps” are just chat boxes bolted onto existing UIs, with little automation or workflow redesign.
  • Some argue LLMs are best used as development accelerators, not as core, non‑deterministic components in production workflows, except in carefully chosen niches.
  • Several note that for many users, even basic prompt expertise is scarce, so simple wrappers can still deliver practical value despite their fragility.

What Counts as an AI Company?

  • Disagreement over whether wrappers “deserve” to call themselves AI companies.
  • One view: if the core value is outsourced to a foundation model, presenting as deep‑tech AI is misleading.
  • Opposing view: using higher‑level abstractions is normal; just as most firms don’t build databases or browsers, “doing AI” can legitimately mean assembling models, data, and UX into something users pay for.