Launch HN: RunAnywhere (YC W26) – Faster AI Inference on Apple Silicon
What RunAnywhere / RCLI Provides
- Company builds MetalRT, a proprietary inference engine for Apple Silicon, plus RCLI, an open-source CLI demo.
- RCLI wires together local speech-to-text (STT), LLM, text-to-speech (TTS), local RAG, and macOS actions into a voice assistant / TUI.
- Emphasis on fully local processing and no telemetry by default.
Performance & Model Choices
- Benchmarks claim MetalRT is modestly faster than competing Apple-Silicon engines (e.g., MLX, uzu) for 0.6B–4B models and much faster for STT/TTS.
- Some see those small models as “toy-sized” and ask for benchmarks on 7B–70B+ models; founders say larger models are on the roadmap.
- Commenters note unified memory makes Apple Silicon attractive for very large models; current MetalRT support is focused on latency-sensitive voice pipelines.
Use Cases & Feature Requests
- Suggested uses: always-on dictation, virtual audio devices for real-time transcription in video calls, and on-device RAG over sensitive documents.
- RCLI supports local RAG with fast hybrid retrieval, and text-only mode (no TTS).
- Requests include better quantization formats (e.g., unsloth), richer TTS voices, diarization, Linux support, and SDK access for third-party apps.
Quality & Limitations
- Tool-calling with small models is unreliable: commands may be “acknowledged” verbally without the correct macOS action firing.
- Team acknowledges this as a core unsolved problem for sub-4B on-device models and plans verification layers and larger models.
- Default TTS quality is criticized as dated; better models (e.g., Kokoro) are available but not default.
Installation & Platform Support
- Some users report segfaults and Homebrew install issues.
- Install script silently installing Homebrew is widely criticized; maintainers agree to change it.
- MetalRT currently targets M3/M4; M1/M2 fall back to llama.cpp. Mobile support and other edge devices are planned.
Licensing & Openness
- RCLI is MIT-licensed; MetalRT and many models are proprietary.
- Some see a closed inference engine as “reinventing the wheel” versus CoreML/MLX; maintainers argue specialization yields higher performance and unified STT/LLM/TTS support.
Security & Trust Concerns
- A web demo leaked third-party API keys; initial “bait” response is criticized as flippant, later walked back with an apology and promise to fix.
- Separate prior controversy: company scraped GitHub data and used lookalike domains for cold email campaigns. Several commenters say this permanently damaged their trust.
HN Voting & Moderation Discussion
- Some users suspect vote manipulation due to fast rise and comment ordering.
- Moderation explains: YC “Launch HN” posts receive special placement; off-topic comments (e.g., about past behavior rather than product) are downweighted but not removed.