Apple's On-Device and Server Foundation Models
On‑device vs Cloud Models and Adapters
- Apple offers a ~3B-parameter on‑device model plus larger server models, with GPT‑4o as a third tier in some flows.
- Many expect Apple to default to on‑device for cost/privacy and fall back to cloud when quality isn’t sufficient, but exact routing logic is unclear.
- Several commenters doubt 3–7B models can do high‑quality open‑ended text generation; they see serious generation handled mostly in the cloud.
- Apple’s “adapters” are essentially LoRA modules: small, task‑specific weight sets plugged into a frozen base model to specialize for summarization, replies, etc., while keeping app footprints small.
Training Data, Scraping, and Copyright
- Strong debate over Apple’s use of “licensed and publicly available” web data via AppleBot.
- Critics call this “stolen” data and argue scraping for LLMs differs from search: it replaces visits to source sites, undermines creator incentives and copyleft strategies.
- Others argue scraping has long been accepted, robots.txt exists, and LLM training is another form of large‑scale indexing and transformation, potentially fair use.
- AppleBot‑Extended lets sites block use for model training, but only after Apple has already crawled; some see the opt‑out as too late and poorly signaled.
Privacy, Security, and Cloud Design
- Apple claims user data is not used to train foundation models and highlights Private Cloud Compute with hardened OS images, attestation, and no long‑term logging.
- Some see this as a meaningful improvement over typical cloud AI; others dismiss it as “security theater” until independent audits and code releases arrive.
- There is concern about any OS‑level funnel to OpenAI, though Apple says third‑party calls are per‑request, explicit, and limited to the prompt.
Model Quality, Benchmarks, and Safety
- Apple’s server model reportedly beats GPT‑3.5 but trails GPT‑4 in human preference tests; small on‑device model is compared to Mistral‑7B and others.
- Omission of Llama 3 8B is noted; theories include licensing restrictions and fear of unfavorable comparisons.
- Apple shows strong scores on a “harmfulness” metric; some welcome caution, others worry about over‑censorship and culturally biased safety filters.
Battery, Performance, and Hardware
- Existing local LLM apps drain iPhone batteries quickly; commenters expect large gains from using the Neural Engine now that CoreML supports autoregressive LLMs.
- Apple claims ~0.6 ms per prompt token TTFT and ~30 tok/s generation on iPhone 15 Pro with 3–4 bit “palettized” weights; some are impressed, others want independent tests.
- Discussion around whether on‑device AI will push Apple to raise base RAM above 8GB; many criticize Apple’s RAM/SSD upsell pricing.
Developer and Ecosystem Implications
- Developers like the idea of a single base model plus many tiny adapters per task, improving latency, memory use, and app size.
- Hopes that third‑party apps will get APIs to ship their own adapters atop Apple’s models; no explicit promise yet.
- Some see Apple’s vertically integrated, privacy‑framed AI as a strong alternative to browser‑based GPT‑4o, even if the models aren’t SOTA.
User Control and Attitudes to AI
- Multiple commenters want the ability to disable both cloud and local AI; Apple appears to allow at least disabling outbound requests.
- Views on generative AI remain split: some report real productivity gains (e.g., coding assistants); others see hallucinations, gimmicks, and long‑term risks to creative work and web quality.