2025-11-23

73% of AI startups are just prompt engineering

Article Credibility and Methodology

Many commenters doubt the piece’s core claim (e.g., “73%”) and even whether the investigation happened as described.
Main objection: the author only appears able to see frontend network traffic (via DevTools / Playwright), not calls from a startup’s backend to OpenAI/Anthropic/etc., so the numbers look extrapolated or invented.
People question how the author could infer internal RAG architectures, Redis usage, or backend latency patterns, and note that only a minority of companies are naive enough to expose API keys in frontend code.
Several call the article “LLM slop,” note the writing style and missing LinkedIn profile, and flag it as likely AI-generated or otherwise untrustworthy, especially given the paywall/PII prompt and lack of released code or data.

Wrappers, Moats, and Platform Risk

Broad agreement that many “AI startups” are thin wrappers or workflows around commodity LLM APIs—analogized to CRUD apps, DB connectors, or 90s HTML startups.
Core concern: no defensible moat if the main value is a prompt on a third‑party model; hyperscalers can replicate features and undercut them.
Others argue this layering is how software always works: almost everyone builds on someone else’s “kingdom” (LLMs, clouds, GPUs, fabs).

Prompt Engineering: How Trivial Is It?

One camp: prompt engineering isn’t “engineering” but trial-and-error copywriting/SEO; calling it engineering inflates its importance.
Another camp: serious systems require substantial scaffolding—data selection and ETL, RAG, hybrid search, rerankers, eval pipelines, moderation, tool usage, and constant tuning—taking weeks of focused work even on open models.
Debate over “prompt is code” vs “prompt is specification”; some see future directions where prompts act as reliably executable specs, others call this a pipe dream given natural language ambiguity.

Models, Specialization, and Lifecycle

Many say it’s rational to start with GPT/Claude/etc. to prove demand, then move to smaller, specialized models for cost and control.
Counterpoint: fine‑tuning on today’s open models risks being stranded when new base models arrive; the value may lie more in data, workflows, and evals than custom weights.
Discussion on specialized vs general models: parallels drawn to CPUs vs GPUs/microcontrollers; expectation that complex products will mix a general model with task-specific ones.

Economics, Bubble, and VC Behavior

Some see a bubble specifically in wrapper startups: unprofitable infra, token‑price exposure, dependence on someone else’s unprofitable models, and “smash‑and‑grab” strategies.
Others argue the real bubble is further down the stack (massive data centers, fabs, power), not in relatively cheap startups.
Commenters note VCs may fund many shallow “ecosystem” products to manufacture traction for flagship model providers.

Product Value and “Real” AI Applications

Frustration that most “AI apps” are just chat boxes bolted onto existing UIs, with little automation or workflow redesign.
Some argue LLMs are best used as development accelerators, not as core, non‑deterministic components in production workflows, except in carefully chosen niches.
Several note that for many users, even basic prompt expertise is scarce, so simple wrappers can still deliver practical value despite their fragility.

What Counts as an AI Company?

Disagreement over whether wrappers “deserve” to call themselves AI companies.
One view: if the core value is outsourced to a foundation model, presenting as deep‑tech AI is misleading.
Opposing view: using higher‑level abstractions is normal; just as most firms don’t build databases or browsers, “doing AI” can legitimately mean assembling models, data, and UX into something users pay for.

Related topics