2024-07-16

Exo: Run your own AI cluster at home with everyday devices

Motivations for running models locally

Privacy and censorship resistance are recurring reasons: users want to run on sensitive data (journals, private audio, “spicy” images) without sending it to large providers.
Customization is easier locally (changing system prompts, using uncensored models, LoRAs, domain-specific setups).
Offline and reliable access is valued, especially where connectivity is unreliable or providers could change policy or shut down.

Arguments for cloud-hosted models

Many note a large quality gap: small local models (e.g., 7–8B) are seen as far behind GPT‑4/Claude-level systems for complex or high-stakes work.
For productivity, $20–100/month in API usage is argued to be cheaper than buying and operating powerful local hardware, especially once you factor in setup and maintenance.
Hosted solutions offer integrations (web search, tools like Wolfram Alpha) that local models typically lack.

Cost and hardware trade-offs

One side: spare hardware + free open models = $0 experimentation; good for students and hobbyists. Cloud is “not free” and can get expensive for heavy use.
Other side: upfront cost of capable GPUs, electricity, and time is high; for “just messing around,” cheap APIs and free tiers (TogetherAI, Groq, OpenRouter) are seen as better.
Some argue that for sustained >8h/day workloads, owned or colo hardware can beat big-cloud pricing; others counter that cloud still benefits from economies of scale.

How Exo works and technical feasibility

Exo uses pipeline parallelism: different devices hold different layers; only activations (embeddings) are sent between them.
Reported activation sizes: ~8–10 KB per token for 8B models, ~32 KB for 70B; expected to stay O(10–100 KB) even for much larger models.
On a local network, bandwidth is seen as fine; latency is the main bottleneck, especially over the internet, limiting SETI@home-style global clustering.
Some users report no speedup when using two MacBooks versus one, suggesting current implementation or scheduling needs work.

Maturity, platform support, and concerns

Broader themes

Debate over whether “swarm compute” of idle devices is desirable versus preserving device longevity, power, and thermals.
Some view local/self-hosted AI as philosophically similar to open source and as a check on concentrated corporate control.

Related topics