2026-04-05

Gemma 4 on iPhone

App quality and rendering issues

Several users report the App Store web page (especially the Dutch version) looks low quality or “fake” in Firefox/Windows and Android, with pixelated text and clipping; others on Safari/Chrome/macOS see it as intended.
A CSS issue (mix-blend-mode: plus-lighter) is identified as broken in Firefox on Windows.
Some feel Apple’s App Store design quality has declined.

Model variants, capabilities, and use cases

The iOS/Android app runs small Gemma 4 E2B/E4B edge models (quantized 2B/4B), not the full 31B/26B, so quality is below top cloud models but impressive for on-device.
With “reasoning” enabled, E4B is considered “solid”; E2B is often deemed too weak.
Reported use cases: coding helpers, home assistants (“turn the lights off”, transit queries), OCR/receipt table extraction, reading/writing practice for kids, travel help (filling landing cards), creative writing, document analysis, and simple real-time audio/video agents on Macs.
Some note significant hallucinations and reasoning mistakes, especially around physics and historical facts.

Performance and hardware

Newer iPhones (e.g., 16/17 Pro) see ~30–50 tok/s and good responsiveness; older or low-RAM devices crash or run hot/slow.
Android performance varies by SoC; Snapdragon and recent Qualcomm NPUs fare well, Exynos and Tensor chips lag.
Debate over whether power or RAM is the main bottleneck for phones.

Alignment, “uncensoring,” and ethics

Strong interest in “dealigned” / “abliterated” local models to avoid refusals on sensitive topics (religion, security, porn, trauma, impersonation, biologics).
Others warn that safety guards prevent misuse and accidental harm, drawing analogies to gun regulation and table-saw safety.
Some claim decensoring can make models behave “stupidly” or give dangerously one-sided advice; others say modern techniques preserve general capability but dangerous domains are anyway poorly trained.

Local vs cloud, privacy, and ecosystem

Many see on-device models as key for privacy, latency, offline use, education, and app development without server backends.
Skepticism about Google’s privacy claims: the app is open source but uses Firebase Analytics and Google’s general privacy policy allows activity collection.
Debate over whether cloud inference is actually profitable and whether prices must rise; some expect long-term shift toward local for light/medium workloads.
Alternative local-AI apps (e.g., Enclave, Locally AI, PocketPal) and toolchains (Ollama, MLX, LiteRT-LM, llama.cpp) are discussed, along with concerns about app bloat if each ships its own large model.

Related topics