Introducing Gemma 3n

Gemma vs Gemini Nano & Licensing

  • Confusion around why both Gemma 3n and Gemini Nano exist for on-device use; both run offline.
  • Clarifications from the thread:
    • Gemini Nano: Android-only, proprietary, accessed via system APIs (AiCore/MLKit), weights not directly usable or redistributable.
    • Gemma 3n: open-weight, available across platforms with multiple sizes, can be used commercially and run on arbitrary runtimes.
  • Some see this split as poorly explained by Google and needing third parties to decode their product strategy.

Copyright & Model Weights

  • Extended debate on whether model weights are copyrightable:
    • US: likely not, under current Copyright Office interpretation that purely mechanical outputs without direct human creativity are not protected.
    • UK/Commonwealth/EU-like regimes: “sweat of the brow” makes copyrightability more plausible.
  • Even if copyright is uncertain, vendors can still enforce terms via contracts, but contracts don’t automatically bind downstream recipients.
  • Tension noted: companies argue training data copyright doesn’t “survive” in weights, yet want copyright-like protection for weights themselves.

“Open Source” vs Open Weights

  • Disagreement over calling Gemma “open source”:
    • Code and architecture are Apache-2.0, but weights are under separate terms with prohibited uses.
    • This fails standard OSI/FSF definitions; best described as “open weights, closed data” rather than fully open source.

Architecture, Capabilities & Real-World Performance

  • Gemma 3n shares architecture with the next Gemini Nano, optimized for on-device efficiency and multimodality (text, vision, audio/video inputs, text output).
  • Users report:
    • E2B/E4B models running on consumer GPUs and phones at ~4–9 tok/s; feasible but not “instant”.
    • 4-bit quantized models ~4.25GB, can run on devices like Pi 5 or RK3588 boards, but with significant latency.
  • A major subthread challenges Google’s “60 fps on Pixel” marketing:
    • Public demo APK appears CPU-only and yields ~0.1 fps end-to-end, far from claims.
    • Google-linked participants say only first-party models can really use the Tensor NPU; 3rd-party NPU support “not a priority.”
    • This is seen by some as misleading, especially given associated hackathon/prize messaging.

Ecosystem, Ports & Tooling

  • GGUF conversions available for llama.cpp; early support in Ollama, LM Studio (including MLX on Apple Silicon), and other runtimes.
  • Some glitches reported (e.g., multimodal not yet wired up in certain tools).

Quality, Benchmarks & Behavior

  • Mixed evaluations:
    • Some users impressed: 8B-like performance from tiny models, good enough for VPS-hosted alternatives to cloud APIs.
    • Others find Gemma 3n weaker than comparable small models (e.g., LLaMA variants) on MMLU, suggesting leaderboard scores may favor conversational style.
  • Reports of looping/repetition traced to bad default sampling settings (e.g., temperature 0).
  • Notable community “benchmarks” like “SVG pelican on a bicycle” show Gemma 3n doing reasonably well at structured SVG output; used informally as a proxy for model capability.

Use Cases for Small Local Models

  • Suggested personal and commercial uses:
    • On-device assistants (including Home Assistant integrations).
    • Spam/SMS filtering without cloud upload.
    • Local speech-to-text, document/image description, photo tagging and search.
    • Offline coding help and lightweight summarization (e.g., RSS feeds) on cheap CPUs.
  • Consensus: small models are not replacements for top proprietary models in complex coding or reasoning, but are valuable for privacy, offline use, and narrow or fine-tuned tasks.

Naming & Product Clarity

  • Complaints about confusing naming (Gemma vs Gemini, “3n” instead of clearer labels like “Gemma 3 Lite”).
  • Calls for a simple, public Google table mapping product names to function, platform, and licensing.