2026-04-27

The Prompt API

Model size, storage, and download behavior

Prompt API requires large on-device models; docs say “at least 22 GB” free space, which many see as excessive for a browser feature.
Actual model folders reported around 3–4 GB, with speculation that 22 GB is a safety threshold to allow multiple versions and avoid filling disks.
Models are lazily downloaded on first use, cached once per browser, and shared across sites.

User experience and performance

Several comments describe slow token generation, heating devices, and long initial download, especially on “baseline” hardware.
Some would rather pay for fast hosted models than run a sluggish local one; others see local as “good enough” for light tasks like search or summarization.
There’s concern that low-end models are only useful for trivial or very short interactions.

Privacy, surveillance, and abuse risks

Some view on-device inference as privacy-preserving; others distrust Chrome/Google and fear background analysis of user data.
Speculation about covert analytics or wiretap-adjacent uses, though others note this API isn’t required for such behavior.
Worries about using visitors’ machines for spam or distributed computation; countered by arguments that tiny models and low payoff limit abuse.

Use cases and experiments

Reported uses include: local search, summarizing hack-day writeups, AI subject-line generation, text adventure modification, AI-based email triage, and potential ad/cookie blockers.
A large subthread explores “de-snarkifying” social media and comment sections: filtering aggression, summarizing long threads, and stripping clickbait.
Some welcome this as removing “junk calories”; others fear homogenized “slop” and further detachment from unfiltered reality.

Standardization, browser ecosystem, and fragmentation

Prompt API currently ties to specific models per browser (e.g., Gemini Nano in Chrome, other models in other browsers).
Developers worry prompts are highly model-specific and that the API lacks introspection to adapt behavior per browser, making testing harder than with APIs like WebGL.
Links show mixed reactions from other browser vendors; some detailed, some dismissive.

Local vs cloud models and model quality

Comparisons claim hosted models (e.g., Gemma via APIs) are faster and more capable than in-browser Gemini Nano.
Some expect browsers/OSes to eventually ship multiple or better models; others find the prospect of AI baked into OSes/browsers dystopian.

Related topics