73% of AI startups are just prompt engineering
Article Credibility and Methodology
- Many commenters doubt the piece’s core claim (e.g., “73%”) and even whether the investigation happened as described.
- Main objection: the author only appears able to see frontend network traffic (via DevTools / Playwright), not calls from a startup’s backend to OpenAI/Anthropic/etc., so the numbers look extrapolated or invented.
- People question how the author could infer internal RAG architectures, Redis usage, or backend latency patterns, and note that only a minority of companies are naive enough to expose API keys in frontend code.
- Several call the article “LLM slop,” note the writing style and missing LinkedIn profile, and flag it as likely AI-generated or otherwise untrustworthy, especially given the paywall/PII prompt and lack of released code or data.
Wrappers, Moats, and Platform Risk
- Broad agreement that many “AI startups” are thin wrappers or workflows around commodity LLM APIs—analogized to CRUD apps, DB connectors, or 90s HTML startups.
- Core concern: no defensible moat if the main value is a prompt on a third‑party model; hyperscalers can replicate features and undercut them.
- Others argue this layering is how software always works: almost everyone builds on someone else’s “kingdom” (LLMs, clouds, GPUs, fabs).
Prompt Engineering: How Trivial Is It?
- One camp: prompt engineering isn’t “engineering” but trial-and-error copywriting/SEO; calling it engineering inflates its importance.
- Another camp: serious systems require substantial scaffolding—data selection and ETL, RAG, hybrid search, rerankers, eval pipelines, moderation, tool usage, and constant tuning—taking weeks of focused work even on open models.
- Debate over “prompt is code” vs “prompt is specification”; some see future directions where prompts act as reliably executable specs, others call this a pipe dream given natural language ambiguity.
Models, Specialization, and Lifecycle
- Many say it’s rational to start with GPT/Claude/etc. to prove demand, then move to smaller, specialized models for cost and control.
- Counterpoint: fine‑tuning on today’s open models risks being stranded when new base models arrive; the value may lie more in data, workflows, and evals than custom weights.
- Discussion on specialized vs general models: parallels drawn to CPUs vs GPUs/microcontrollers; expectation that complex products will mix a general model with task-specific ones.
Economics, Bubble, and VC Behavior
- Some see a bubble specifically in wrapper startups: unprofitable infra, token‑price exposure, dependence on someone else’s unprofitable models, and “smash‑and‑grab” strategies.
- Others argue the real bubble is further down the stack (massive data centers, fabs, power), not in relatively cheap startups.
- Commenters note VCs may fund many shallow “ecosystem” products to manufacture traction for flagship model providers.
Product Value and “Real” AI Applications
- Frustration that most “AI apps” are just chat boxes bolted onto existing UIs, with little automation or workflow redesign.
- Some argue LLMs are best used as development accelerators, not as core, non‑deterministic components in production workflows, except in carefully chosen niches.
- Several note that for many users, even basic prompt expertise is scarce, so simple wrappers can still deliver practical value despite their fragility.
What Counts as an AI Company?
- Disagreement over whether wrappers “deserve” to call themselves AI companies.
- One view: if the core value is outsourced to a foundation model, presenting as deep‑tech AI is misleading.
- Opposing view: using higher‑level abstractions is normal; just as most firms don’t build databases or browsers, “doing AI” can legitimately mean assembling models, data, and UX into something users pay for.