Tabby: Self-hosted AI coding assistant

Overview of Tabby and Capabilities

  • Self-hosted AI coding assistant with code completion and “codebase chat,” positioned as an on‑prem / team platform (SSO, access control, auth).
  • Marketed as one of the few fully self-service on-prem options, with adopters saying performance is competitive with hosted tools.
  • Built-in RAG/doc integration so it can be taught unfamiliar API frameworks via documentation ingestion.

Hardware, Models, and Performance

  • Supports Nvidia (CUDA), AMD (via Vulkan), and Apple Silicon; Macs are “OK for individual use” but not ideal for multi‑user servers.
  • Rule-of-thumb: ~1 GB RAM per 1B parameters (less with heavy quantization). Context length also drives memory needs.
  • Tiny models (1–3B) are “dumb” for conversational coding but fine for tab completions; 7–70B open models can surpass GPT‑4o‑mini for coding if hardware permits.
  • Single‑GPU only by default; multi‑GPU use suggested via external backends like vLLM and OpenAI-compatible endpoints.

Deployment, IDE Support, and Alternatives

  • Designed primarily for shared servers but can run on powerful personal machines or in Docker on‑prem.
  • Community notes Eclipse client exists but is not prominently documented; requests for VS2022, Sublime, Zed, MSVC support.
  • Comparisons with other local setups (Ollama + Continue.dev, Twinny) highlight trade‑offs in ease of use, hardware, and licensing.

Telemetry, Licensing, and Business Model

  • Community Edition collects non‑toggleable IDE/extension telemetry, limited to hardware and model metadata per shared struct.
  • Confusion over “open source but up to 5 users” pricing; others clarify that open source does not mean cost-free for all uses and point to the license.

Code Quality, Skill Development, and Determinism

  • Many worry LLMs generate “junior-level” or inefficient code, and that blind acceptance may stall developer growth.
  • Counterpoints:
    • LLMs can accelerate capable devs and serve as a new abstraction layer, similar to moving from assembly to high-level languages.
    • Poor code quality self-corrects through tests, debugging, and maintenance pressures.
  • Long subthread on determinism: traditional compilers vs stochastic LLMs, temperature/seed control, and whether nondeterminism is acceptable for production code.

Critiques of Company Practices

  • One commenter reports an unpaid, multi‑round, take‑home–heavy interview ending in ghosting, sparking broader criticism of such hiring processes as disrespectful and a red flag.