Gemma 4 on iPhone
App quality and rendering issues
- Several users report the App Store web page (especially the Dutch version) looks low quality or “fake” in Firefox/Windows and Android, with pixelated text and clipping; others on Safari/Chrome/macOS see it as intended.
- A CSS issue (
mix-blend-mode: plus-lighter) is identified as broken in Firefox on Windows. - Some feel Apple’s App Store design quality has declined.
Model variants, capabilities, and use cases
- The iOS/Android app runs small Gemma 4 E2B/E4B edge models (quantized 2B/4B), not the full 31B/26B, so quality is below top cloud models but impressive for on-device.
- With “reasoning” enabled, E4B is considered “solid”; E2B is often deemed too weak.
- Reported use cases: coding helpers, home assistants (“turn the lights off”, transit queries), OCR/receipt table extraction, reading/writing practice for kids, travel help (filling landing cards), creative writing, document analysis, and simple real-time audio/video agents on Macs.
- Some note significant hallucinations and reasoning mistakes, especially around physics and historical facts.
Performance and hardware
- Newer iPhones (e.g., 16/17 Pro) see ~30–50 tok/s and good responsiveness; older or low-RAM devices crash or run hot/slow.
- Android performance varies by SoC; Snapdragon and recent Qualcomm NPUs fare well, Exynos and Tensor chips lag.
- Debate over whether power or RAM is the main bottleneck for phones.
Alignment, “uncensoring,” and ethics
- Strong interest in “dealigned” / “abliterated” local models to avoid refusals on sensitive topics (religion, security, porn, trauma, impersonation, biologics).
- Others warn that safety guards prevent misuse and accidental harm, drawing analogies to gun regulation and table-saw safety.
- Some claim decensoring can make models behave “stupidly” or give dangerously one-sided advice; others say modern techniques preserve general capability but dangerous domains are anyway poorly trained.
Local vs cloud, privacy, and ecosystem
- Many see on-device models as key for privacy, latency, offline use, education, and app development without server backends.
- Skepticism about Google’s privacy claims: the app is open source but uses Firebase Analytics and Google’s general privacy policy allows activity collection.
- Debate over whether cloud inference is actually profitable and whether prices must rise; some expect long-term shift toward local for light/medium workloads.
- Alternative local-AI apps (e.g., Enclave, Locally AI, PocketPal) and toolchains (Ollama, MLX, LiteRT-LM, llama.cpp) are discussed, along with concerns about app bloat if each ships its own large model.