The Prompt API

Model size, storage, and download behavior

  • Prompt API requires large on-device models; docs say “at least 22 GB” free space, which many see as excessive for a browser feature.
  • Actual model folders reported around 3–4 GB, with speculation that 22 GB is a safety threshold to allow multiple versions and avoid filling disks.
  • Models are lazily downloaded on first use, cached once per browser, and shared across sites.

User experience and performance

  • Several comments describe slow token generation, heating devices, and long initial download, especially on “baseline” hardware.
  • Some would rather pay for fast hosted models than run a sluggish local one; others see local as “good enough” for light tasks like search or summarization.
  • There’s concern that low-end models are only useful for trivial or very short interactions.

Privacy, surveillance, and abuse risks

  • Some view on-device inference as privacy-preserving; others distrust Chrome/Google and fear background analysis of user data.
  • Speculation about covert analytics or wiretap-adjacent uses, though others note this API isn’t required for such behavior.
  • Worries about using visitors’ machines for spam or distributed computation; countered by arguments that tiny models and low payoff limit abuse.

Use cases and experiments

  • Reported uses include: local search, summarizing hack-day writeups, AI subject-line generation, text adventure modification, AI-based email triage, and potential ad/cookie blockers.
  • A large subthread explores “de-snarkifying” social media and comment sections: filtering aggression, summarizing long threads, and stripping clickbait.
  • Some welcome this as removing “junk calories”; others fear homogenized “slop” and further detachment from unfiltered reality.

Standardization, browser ecosystem, and fragmentation

  • Prompt API currently ties to specific models per browser (e.g., Gemini Nano in Chrome, other models in other browsers).
  • Developers worry prompts are highly model-specific and that the API lacks introspection to adapt behavior per browser, making testing harder than with APIs like WebGL.
  • Links show mixed reactions from other browser vendors; some detailed, some dismissive.

Local vs cloud models and model quality

  • Comparisons claim hosted models (e.g., Gemma via APIs) are faster and more capable than in-browser Gemini Nano.
  • Some expect browsers/OSes to eventually ship multiple or better models; others find the prospect of AI baked into OSes/browsers dystopian.