2026-05-12

Show HN: Needle: We Distilled Gemini Tool Calling into a 26M Model

Model & Capabilities

26M-parameter, INT4-quantized (~14 MB) tool-calling model distilled from Gemini, intended to run on CPUs and tiny devices.
Focus is on selecting tools and filling arguments, not general conversation or “knowledge.”
Currently lacks robust in-context learning; authors say that’s “in the works.”
Architecturally notable for removing MLP/FFN blocks, relying on attention plus external tools/knowledge.

Use Cases & Integration Ideas

OS-level or shared “natural language parser” that all CLI programs can use (NL to program flags).
Voice assistants and smart-home control (timers, weather, lights), Siri-like behavior, Home Assistant integration.
Embedded in devices like watches, earphones, glasses, Raspberry Pi–based smart speakers.
As a thin tool router in larger agent systems (small model chooses tools, big model handles reasoning/summarization).
Used with MCP or similar to abstract away direct API integrations (“just give tools, let model figure it out”).
Local helpers for complex build/test infrastructures or privacy-first desktop/mobile apps.

Demos, Deployment & Tooling

Initial tokenizer repo access issue on Hugging Face was fixed.
Community quickly deployed a Hugging Face Space with a very simple Dockerfile.
Suggestions for demos: short videos, terminal recordings (e.g., asciinema), WebGPU/WASM/Transformers.js browser demo.

Performance, Limitations & Open Questions

Some users report strong results for basic tasks (timers, shopping lists), even surpassing Siri for simple flows.
Others find limitations: confusion with overlapping tools, repeated/duplicated tool calls, weak handling of ambiguous requests, and reliance on tight contexts/prompts.
Multi-step workflows and stateful chains are only partially demonstrated; long-horizon tool planning behavior remains unclear.
Questions raised on ONNX/browser deployment and formal tool-use benchmarks; answers are not fully detailed in the thread.

Distillation, Gemini Choice & Ethics

Several posts explain distillation as training a small “student” on a big model’s outputs; note it’s lossy but efficient.
Gemini chosen mainly for cheaper APIs and solid one-shot tool-calling, though multiple commenters say Gemini is weak at tool use compared with alternatives.
Concerns raised that Gemini’s ToS forbids distilling competing models; warnings about possible bans or degraded outputs via Google’s anti-distillation defenses.
Others highlight perceived double standards, given that large labs trained on web data without individual consent.

Related topics