2024-06-10

Apple's On-Device and Server Foundation Models

On‑device vs Cloud Models and Adapters

Apple offers a ~3B-parameter on‑device model plus larger server models, with GPT‑4o as a third tier in some flows.
Many expect Apple to default to on‑device for cost/privacy and fall back to cloud when quality isn’t sufficient, but exact routing logic is unclear.
Several commenters doubt 3–7B models can do high‑quality open‑ended text generation; they see serious generation handled mostly in the cloud.
Apple’s “adapters” are essentially LoRA modules: small, task‑specific weight sets plugged into a frozen base model to specialize for summarization, replies, etc., while keeping app footprints small.

Training Data, Scraping, and Copyright

Strong debate over Apple’s use of “licensed and publicly available” web data via AppleBot.
Critics call this “stolen” data and argue scraping for LLMs differs from search: it replaces visits to source sites, undermines creator incentives and copyleft strategies.
Others argue scraping has long been accepted, robots.txt exists, and LLM training is another form of large‑scale indexing and transformation, potentially fair use.
AppleBot‑Extended lets sites block use for model training, but only after Apple has already crawled; some see the opt‑out as too late and poorly signaled.

Privacy, Security, and Cloud Design

Apple claims user data is not used to train foundation models and highlights Private Cloud Compute with hardened OS images, attestation, and no long‑term logging.
Some see this as a meaningful improvement over typical cloud AI; others dismiss it as “security theater” until independent audits and code releases arrive.
There is concern about any OS‑level funnel to OpenAI, though Apple says third‑party calls are per‑request, explicit, and limited to the prompt.

Model Quality, Benchmarks, and Safety

Apple’s server model reportedly beats GPT‑3.5 but trails GPT‑4 in human preference tests; small on‑device model is compared to Mistral‑7B and others.
Omission of Llama 3 8B is noted; theories include licensing restrictions and fear of unfavorable comparisons.
Apple shows strong scores on a “harmfulness” metric; some welcome caution, others worry about over‑censorship and culturally biased safety filters.

Battery, Performance, and Hardware

Existing local LLM apps drain iPhone batteries quickly; commenters expect large gains from using the Neural Engine now that CoreML supports autoregressive LLMs.
Apple claims ~0.6 ms per prompt token TTFT and ~30 tok/s generation on iPhone 15 Pro with 3–4 bit “palettized” weights; some are impressed, others want independent tests.
Discussion around whether on‑device AI will push Apple to raise base RAM above 8GB; many criticize Apple’s RAM/SSD upsell pricing.

Developer and Ecosystem Implications

Developers like the idea of a single base model plus many tiny adapters per task, improving latency, memory use, and app size.
Hopes that third‑party apps will get APIs to ship their own adapters atop Apple’s models; no explicit promise yet.
Some see Apple’s vertically integrated, privacy‑framed AI as a strong alternative to browser‑based GPT‑4o, even if the models aren’t SOTA.

User Control and Attitudes to AI

Multiple commenters want the ability to disable both cloud and local AI; Apple appears to allow at least disabling outbound requests.
Views on generative AI remain split: some report real productivity gains (e.g., coding assistants); others see hallucinations, gimmicks, and long‑term risks to creative work and web quality.

Related topics