Tabby: Self-hosted AI coding assistant
Overview of Tabby and Capabilities
- Self-hosted AI coding assistant with code completion and “codebase chat,” positioned as an on‑prem / team platform (SSO, access control, auth).
- Marketed as one of the few fully self-service on-prem options, with adopters saying performance is competitive with hosted tools.
- Built-in RAG/doc integration so it can be taught unfamiliar API frameworks via documentation ingestion.
Hardware, Models, and Performance
- Supports Nvidia (CUDA), AMD (via Vulkan), and Apple Silicon; Macs are “OK for individual use” but not ideal for multi‑user servers.
- Rule-of-thumb: ~1 GB RAM per 1B parameters (less with heavy quantization). Context length also drives memory needs.
- Tiny models (1–3B) are “dumb” for conversational coding but fine for tab completions; 7–70B open models can surpass GPT‑4o‑mini for coding if hardware permits.
- Single‑GPU only by default; multi‑GPU use suggested via external backends like vLLM and OpenAI-compatible endpoints.
Deployment, IDE Support, and Alternatives
- Designed primarily for shared servers but can run on powerful personal machines or in Docker on‑prem.
- Community notes Eclipse client exists but is not prominently documented; requests for VS2022, Sublime, Zed, MSVC support.
- Comparisons with other local setups (Ollama + Continue.dev, Twinny) highlight trade‑offs in ease of use, hardware, and licensing.
Telemetry, Licensing, and Business Model
- Community Edition collects non‑toggleable IDE/extension telemetry, limited to hardware and model metadata per shared struct.
- Confusion over “open source but up to 5 users” pricing; others clarify that open source does not mean cost-free for all uses and point to the license.
Code Quality, Skill Development, and Determinism
- Many worry LLMs generate “junior-level” or inefficient code, and that blind acceptance may stall developer growth.
- Counterpoints:
- LLMs can accelerate capable devs and serve as a new abstraction layer, similar to moving from assembly to high-level languages.
- Poor code quality self-corrects through tests, debugging, and maintenance pressures.
- Long subthread on determinism: traditional compilers vs stochastic LLMs, temperature/seed control, and whether nondeterminism is acceptable for production code.
Critiques of Company Practices
- One commenter reports an unpaid, multi‑round, take‑home–heavy interview ending in ghosting, sparking broader criticism of such hiring processes as disrespectful and a red flag.