Show HN: Needle: We Distilled Gemini Tool Calling into a 26M Model
Model & Capabilities
- 26M-parameter, INT4-quantized (~14 MB) tool-calling model distilled from Gemini, intended to run on CPUs and tiny devices.
- Focus is on selecting tools and filling arguments, not general conversation or “knowledge.”
- Currently lacks robust in-context learning; authors say that’s “in the works.”
- Architecturally notable for removing MLP/FFN blocks, relying on attention plus external tools/knowledge.
Use Cases & Integration Ideas
- OS-level or shared “natural language parser” that all CLI programs can use (NL to program flags).
- Voice assistants and smart-home control (timers, weather, lights), Siri-like behavior, Home Assistant integration.
- Embedded in devices like watches, earphones, glasses, Raspberry Pi–based smart speakers.
- As a thin tool router in larger agent systems (small model chooses tools, big model handles reasoning/summarization).
- Used with MCP or similar to abstract away direct API integrations (“just give tools, let model figure it out”).
- Local helpers for complex build/test infrastructures or privacy-first desktop/mobile apps.
Demos, Deployment & Tooling
- Initial tokenizer repo access issue on Hugging Face was fixed.
- Community quickly deployed a Hugging Face Space with a very simple Dockerfile.
- Suggestions for demos: short videos, terminal recordings (e.g., asciinema), WebGPU/WASM/Transformers.js browser demo.
Performance, Limitations & Open Questions
- Some users report strong results for basic tasks (timers, shopping lists), even surpassing Siri for simple flows.
- Others find limitations: confusion with overlapping tools, repeated/duplicated tool calls, weak handling of ambiguous requests, and reliance on tight contexts/prompts.
- Multi-step workflows and stateful chains are only partially demonstrated; long-horizon tool planning behavior remains unclear.
- Questions raised on ONNX/browser deployment and formal tool-use benchmarks; answers are not fully detailed in the thread.
Distillation, Gemini Choice & Ethics
- Several posts explain distillation as training a small “student” on a big model’s outputs; note it’s lossy but efficient.
- Gemini chosen mainly for cheaper APIs and solid one-shot tool-calling, though multiple commenters say Gemini is weak at tool use compared with alternatives.
- Concerns raised that Gemini’s ToS forbids distilling competing models; warnings about possible bans or degraded outputs via Google’s anti-distillation defenses.
- Others highlight perceived double standards, given that large labs trained on web data without individual consent.