Gemma 4 on iPhone

App quality and rendering issues

  • Several users report the App Store web page (especially the Dutch version) looks low quality or “fake” in Firefox/Windows and Android, with pixelated text and clipping; others on Safari/Chrome/macOS see it as intended.
  • A CSS issue (mix-blend-mode: plus-lighter) is identified as broken in Firefox on Windows.
  • Some feel Apple’s App Store design quality has declined.

Model variants, capabilities, and use cases

  • The iOS/Android app runs small Gemma 4 E2B/E4B edge models (quantized 2B/4B), not the full 31B/26B, so quality is below top cloud models but impressive for on-device.
  • With “reasoning” enabled, E4B is considered “solid”; E2B is often deemed too weak.
  • Reported use cases: coding helpers, home assistants (“turn the lights off”, transit queries), OCR/receipt table extraction, reading/writing practice for kids, travel help (filling landing cards), creative writing, document analysis, and simple real-time audio/video agents on Macs.
  • Some note significant hallucinations and reasoning mistakes, especially around physics and historical facts.

Performance and hardware

  • Newer iPhones (e.g., 16/17 Pro) see ~30–50 tok/s and good responsiveness; older or low-RAM devices crash or run hot/slow.
  • Android performance varies by SoC; Snapdragon and recent Qualcomm NPUs fare well, Exynos and Tensor chips lag.
  • Debate over whether power or RAM is the main bottleneck for phones.

Alignment, “uncensoring,” and ethics

  • Strong interest in “dealigned” / “abliterated” local models to avoid refusals on sensitive topics (religion, security, porn, trauma, impersonation, biologics).
  • Others warn that safety guards prevent misuse and accidental harm, drawing analogies to gun regulation and table-saw safety.
  • Some claim decensoring can make models behave “stupidly” or give dangerously one-sided advice; others say modern techniques preserve general capability but dangerous domains are anyway poorly trained.

Local vs cloud, privacy, and ecosystem

  • Many see on-device models as key for privacy, latency, offline use, education, and app development without server backends.
  • Skepticism about Google’s privacy claims: the app is open source but uses Firebase Analytics and Google’s general privacy policy allows activity collection.
  • Debate over whether cloud inference is actually profitable and whether prices must rise; some expect long-term shift toward local for light/medium workloads.
  • Alternative local-AI apps (e.g., Enclave, Locally AI, PocketPal) and toolchains (Ollama, MLX, LiteRT-LM, llama.cpp) are discussed, along with concerns about app bloat if each ships its own large model.