2025-06-25

MCP in LM Studio

Hardware for Local LLMs (Mac Studio vs GPU Rigs)

Big thread around a 512GB RAM Mac Studio (~$12k) as a “one-box” local LLM machine.
Pro-Apple side: unified memory lets you load huge models (e.g. DeepSeek R1 671B Q4, large Qwen models) that don’t fit in single RTX cards; power draw is far lower than multi-GPU rigs; avoids noise/space/complexity of server builds.
Pro-GPU side: RTX 6000 / multi-GPU setups have far higher memory bandwidth and much faster prompt processing; better tokens/s/$ for models that fit in VRAM; concern that 512GB RAM with low bandwidth will feel sluggish for agentic/MCP-heavy prompts.
Some discuss CPU+DDR5 approaches (EPYC/Xeon + fast NVMe) for MoE at hobby speeds.
Rumors about future Macs dropping unified memory for split CPU/GPU are seen as potentially ending this “accidental winner” for giant local models.

Why Local vs Cloud Models?

Many acknowledge cloud models (Claude, Gemini, o3) are higher quality and often faster.
Reasons to go local:
- Offline use (airplanes, unreliable ISPs, GFW scenarios).
- Cost control for bulk tasks (classification, experimentation, retries) vs per-token billing.
- Data privacy / “sovereignty” and not worrying about metering while hacking.

LM Studio: Strengths and Weaknesses

Strong praise for LM Studio’s “first run” experience: easy install, automatic model suggestions, good hardware compatibility hints, and built-in OpenAI-compatible server.
Considered more approachable than Ollama + Open WebUI for non-terminal users; LM Studio can also be used as a backend for Open WebUI and other OpenAI clients.
MLX support on Apple Silicon is highlighted as efficient.
Criticisms:
- Electron UI is heavy (CPU + ~500MB VRAM idle), UI design is too colorful/busy for some.
- No pure “engine-only” deployment; headless mode exists but still tied to the app/CLI.
- Closed source and a license that forbids work-related use are seen as major drawbacks.

MCP Support and Confusion

General excitement that LM Studio supports MCP, making it easy to experiment with local tools.
Real-world issues:
- Initial MCP UX in LM Studio is confusing (hidden sidebars, model search icon, non-obvious flow).
- Many users mistakenly try Gemma3 for tools; others point out Gemma3 wasn’t trained for tool calling and recommend Qwen3 instead.
Conceptual skepticism:
- Some see MCP as “tools as a service” / a rebranded tools API, currently more hype than clear problem-fit.
- Confusion over “MCP Host” vs “client” terminology; spec and transport descriptions criticized as imprecise, possibly LLM-written and poorly reviewed.
Examples of emerging MCP ecosystems: Apple Containers + coderunner, anytype MCP server, recurse.chat.

Other Tools and Comparisons

Open WebUI, Ollama, koboldcpp, AnythingLLM, Msty, Exo, recurse.chat are all mentioned as alternatives or complements with different tradeoffs (UI quality, ease of setup, roleplay features, workflow editors, mobile focus, clustering GPUs across hosts).
Some users are happy with current tools and hesitant to invest time in trying multiple stacks.

Related topics