Introducing Gemma 3n
Gemma vs Gemini Nano & Licensing
- Confusion around why both Gemma 3n and Gemini Nano exist for on-device use; both run offline.
- Clarifications from the thread:
- Gemini Nano: Android-only, proprietary, accessed via system APIs (AiCore/MLKit), weights not directly usable or redistributable.
- Gemma 3n: open-weight, available across platforms with multiple sizes, can be used commercially and run on arbitrary runtimes.
- Some see this split as poorly explained by Google and needing third parties to decode their product strategy.
Copyright & Model Weights
- Extended debate on whether model weights are copyrightable:
- US: likely not, under current Copyright Office interpretation that purely mechanical outputs without direct human creativity are not protected.
- UK/Commonwealth/EU-like regimes: “sweat of the brow” makes copyrightability more plausible.
- Even if copyright is uncertain, vendors can still enforce terms via contracts, but contracts don’t automatically bind downstream recipients.
- Tension noted: companies argue training data copyright doesn’t “survive” in weights, yet want copyright-like protection for weights themselves.
“Open Source” vs Open Weights
- Disagreement over calling Gemma “open source”:
- Code and architecture are Apache-2.0, but weights are under separate terms with prohibited uses.
- This fails standard OSI/FSF definitions; best described as “open weights, closed data” rather than fully open source.
Architecture, Capabilities & Real-World Performance
- Gemma 3n shares architecture with the next Gemini Nano, optimized for on-device efficiency and multimodality (text, vision, audio/video inputs, text output).
- Users report:
- E2B/E4B models running on consumer GPUs and phones at ~4–9 tok/s; feasible but not “instant”.
- 4-bit quantized models ~4.25GB, can run on devices like Pi 5 or RK3588 boards, but with significant latency.
- A major subthread challenges Google’s “60 fps on Pixel” marketing:
- Public demo APK appears CPU-only and yields ~0.1 fps end-to-end, far from claims.
- Google-linked participants say only first-party models can really use the Tensor NPU; 3rd-party NPU support “not a priority.”
- This is seen by some as misleading, especially given associated hackathon/prize messaging.
Ecosystem, Ports & Tooling
- GGUF conversions available for llama.cpp; early support in Ollama, LM Studio (including MLX on Apple Silicon), and other runtimes.
- Some glitches reported (e.g., multimodal not yet wired up in certain tools).
Quality, Benchmarks & Behavior
- Mixed evaluations:
- Some users impressed: 8B-like performance from tiny models, good enough for VPS-hosted alternatives to cloud APIs.
- Others find Gemma 3n weaker than comparable small models (e.g., LLaMA variants) on MMLU, suggesting leaderboard scores may favor conversational style.
- Reports of looping/repetition traced to bad default sampling settings (e.g., temperature 0).
- Notable community “benchmarks” like “SVG pelican on a bicycle” show Gemma 3n doing reasonably well at structured SVG output; used informally as a proxy for model capability.
Use Cases for Small Local Models
- Suggested personal and commercial uses:
- On-device assistants (including Home Assistant integrations).
- Spam/SMS filtering without cloud upload.
- Local speech-to-text, document/image description, photo tagging and search.
- Offline coding help and lightweight summarization (e.g., RSS feeds) on cheap CPUs.
- Consensus: small models are not replacements for top proprietary models in complex coding or reasoning, but are valuable for privacy, offline use, and narrow or fine-tuned tasks.
Naming & Product Clarity
- Complaints about confusing naming (Gemma vs Gemini, “3n” instead of clearer labels like “Gemma 3 Lite”).
- Calls for a simple, public Google table mapping product names to function, platform, and licensing.