2025-06-26

Introducing Gemma 3n

Gemma vs Gemini Nano & Licensing

Confusion around why both Gemma 3n and Gemini Nano exist for on-device use; both run offline.
Clarifications from the thread:
- Gemini Nano: Android-only, proprietary, accessed via system APIs (AiCore/MLKit), weights not directly usable or redistributable.
- Gemma 3n: open-weight, available across platforms with multiple sizes, can be used commercially and run on arbitrary runtimes.
Some see this split as poorly explained by Google and needing third parties to decode their product strategy.

Copyright & Model Weights

Extended debate on whether model weights are copyrightable:
- US: likely not, under current Copyright Office interpretation that purely mechanical outputs without direct human creativity are not protected.
- UK/Commonwealth/EU-like regimes: “sweat of the brow” makes copyrightability more plausible.
Even if copyright is uncertain, vendors can still enforce terms via contracts, but contracts don’t automatically bind downstream recipients.
Tension noted: companies argue training data copyright doesn’t “survive” in weights, yet want copyright-like protection for weights themselves.

“Open Source” vs Open Weights

Disagreement over calling Gemma “open source”:
- Code and architecture are Apache-2.0, but weights are under separate terms with prohibited uses.
- This fails standard OSI/FSF definitions; best described as “open weights, closed data” rather than fully open source.

Architecture, Capabilities & Real-World Performance

Gemma 3n shares architecture with the next Gemini Nano, optimized for on-device efficiency and multimodality (text, vision, audio/video inputs, text output).
Users report:
- E2B/E4B models running on consumer GPUs and phones at ~4–9 tok/s; feasible but not “instant”.
- 4-bit quantized models ~4.25GB, can run on devices like Pi 5 or RK3588 boards, but with significant latency.
A major subthread challenges Google’s “60 fps on Pixel” marketing:
- Public demo APK appears CPU-only and yields ~0.1 fps end-to-end, far from claims.
- Google-linked participants say only first-party models can really use the Tensor NPU; 3rd-party NPU support “not a priority.”
- This is seen by some as misleading, especially given associated hackathon/prize messaging.

Ecosystem, Ports & Tooling

GGUF conversions available for llama.cpp; early support in Ollama, LM Studio (including MLX on Apple Silicon), and other runtimes.
Some glitches reported (e.g., multimodal not yet wired up in certain tools).

Quality, Benchmarks & Behavior

Mixed evaluations:
- Some users impressed: 8B-like performance from tiny models, good enough for VPS-hosted alternatives to cloud APIs.
- Others find Gemma 3n weaker than comparable small models (e.g., LLaMA variants) on MMLU, suggesting leaderboard scores may favor conversational style.
Reports of looping/repetition traced to bad default sampling settings (e.g., temperature 0).
Notable community “benchmarks” like “SVG pelican on a bicycle” show Gemma 3n doing reasonably well at structured SVG output; used informally as a proxy for model capability.

Use Cases for Small Local Models

Suggested personal and commercial uses:
- On-device assistants (including Home Assistant integrations).
- Spam/SMS filtering without cloud upload.
- Local speech-to-text, document/image description, photo tagging and search.
- Offline coding help and lightweight summarization (e.g., RSS feeds) on cheap CPUs.
Consensus: small models are not replacements for top proprietary models in complex coding or reasoning, but are valuable for privacy, offline use, and narrow or fine-tuned tasks.

Naming & Product Clarity

Complaints about confusing naming (Gemma vs Gemini, “3n” instead of clearer labels like “Gemma 3 Lite”).
Calls for a simple, public Google table mapping product names to function, platform, and licensing.

Related topics