Apple M3 Ultra
Unified Memory and Soldering Debate
- Thread opens with “soldered?” and quickly shifts to why: very wide, high‑bandwidth buses demand extremely short traces and tightly coupled packaging.
- People note the RAM is on‑package, not on‑die, but still not user‑replaceable; anything socketed (CAMM2, DIMMs) would likely break signal integrity at 512–1024‑bit widths.
- Some argue skilled techs can desolder and upgrade, but others counter that this is qualitatively different from “plop in a DIMM,” and not realistic as a general upgrade path.
AI, LLMs, and Memory vs Bandwidth vs Compute
- Huge enthusiasm for 512GB unified memory: it enables fitting very large models (e.g., DeepSeek‑R1 671B Q4) that are impossible on consumer GPUs, and gives GPU/NPU direct access to all of it.
- Multiple long subthreads compare M3 Ultra (≈819 GB/s) to EPYC (12‑channel DDR5 ≈576 GB/s), 4090 (~1 TB/s), H100/H200/B200 (3–8 TB/s).
- Consensus:
- Capacity: M3 Ultra is unique and relatively cheap for GPU‑addressable 512GB.
- Bandwidth: good vs consumer GPUs, far below datacenter parts.
- Compute: orders of magnitude less TOPS than Nvidia’s AI GPUs; likely compute‑bound on large models.
- Debate on whether CPU‑only EPYC boxes with 512–768GB are better: more bandwidth in dual‑socket configs but often compute‑limited and slower tokens/s in practice.
- Several back‑of‑envelope estimates put DeepSeek‑R1 Q4 on M3 Ultra somewhere around 20–40 tok/s; EPYC builds cited around 3–6 tok/s.
Pricing, Value, and Target Niche
- 512GB config around $9.5–14k sparks argument:
- Pro‑AI and media users: “bargain” versus multi‑GPU or H100‑class servers, and trivial vs cloud costs at scale.
- Skeptics: for the money you can build Threadripper/EPYC + multi‑GPU rigs with far more FLOPs (but vastly less unified VRAM).
- Many agree this is extremely niche: people needing huge local models, macOS‑only workflows, or privacy‑sensitive inference.
M3 Ultra Timing and Product Matrix
- Confusion that M3 Ultra ships after M4 Max and only in Mac Studio. Some speculation:
- M4 Max reportedly lacks the UltraFusion interconnect, so no M4 Ultra this gen.
- Apple prioritizing datacenter use or yield/TSMC constraints.
- Mac Pro with M2 Ultra now looks particularly odd; some expect a later M4‑based Pro or even quiet discontinuation.
OS, Tooling, and Ecosystem Limits
- Repeated concern that macOS, lack of native containers, and weak PyTorch/JAX tooling limit the chip’s appeal as “serious AI” hardware.
- Asahi Linux is promising but incomplete for M3/M4 and not something enterprises can rely on.
- CUDA lock‑in remains a central reason most AI shops will stay with Nvidia, despite Apple’s perf/W and unified memory advantages.