2026-02-01

My iPhone 16 Pro Max produces garbage output when running MLX LLMs

Calculator apps and math tools on phones

Many commenters pivot to lamenting built‑in calculator apps as “underbaked.”
Preference for emulating classic graphing calculators (HP 48/Prime, TI‑83/84/89) or advanced apps (NumWorks, PCalc, free42/plus42, MathStudio).
Desired features: visible history, scrollback, editing previous expressions, variables, and REPL‑like workflows—essentially a small interpreted math language rather than a 4‑function replica.
Some avoid newer apps that appear abandoned or rarely updated.

Floating‑point determinism and NaNs

Several posts stress that low‑level numeric results are often not bit‑for‑bit reproducible across hardware, compilers, or even builds of the same app.
Clarification: floating‑point addition is commutative but not associative; reordering operations can change results.
Long subthread on NaN propagation and IEEE‑754:
- The standard mandates a quiet NaN output but only recommends propagating payloads from inputs; implementations may return canonical NaNs.
- Relying on exact NaN bit patterns across platforms is considered fragile.
C++ papers and tool docs are cited to reinforce that differing results are acceptable and expected in practice.

Diagnosing the iPhone / MLX bug

Key anomaly: identical ML model, weights, prompt, and OS yield drastically wrong tensors on one iPhone 16 Pro Max, while other Apple devices match each other.
Later update: the same code works correctly on an iPhone 17 Pro Max, suggesting that particular 16 Pro Max was defective or mis‑handled by the stack.
Others note MLX typically targets GPU/Metal, not necessarily the Neural Engine, so early speculation about the ANE may have been off.
A linked MLX pull request identifies a bug where an iPhone 16 Pro SKU was misdetected as supporting a specific Neural Accelerator path, causing silently wrong results; this is framed as a software/stack issue, not defective silicon.
Some wish for a minimal repro or tests on multiple 16 Pro Max devices to quantify how widespread the issue is.

Apple software quality and keyboard complaints

Several commenters report recent, severe degradation in iOS keyboard autocorrect and predictive text, on multiple devices, suspecting broader ML regressions.
Reinstalling iOS is described as painful due to re‑auth, wallet, and app logins; older encrypted iTunes backups once allowed near‑perfect device cloning, which some miss.

Impact of the blog post and debugging culture

MLX bug fix landed a day after the blog’s date; some infer the post motivated the fix, others argue that timing is likely coincidental or routed through normal issue/PR workflows.
Multiple comments praise the author’s methodical debugging—hand‑written repros, isolating failing tensor steps—contrasted with typical “AI rage” or conspiracy narratives.
Calls for sharing the minimal failing code are framed as beneficial to both Apple and the community; lack of hardware‑based CI is criticized.

Side discussions on LLMs and “moon plus sun”

Some dismiss using LLMs as calculators, noting neural nets are inherently weak at extrapolative arithmetic and rely on patterns in training data.
Others emphasize the truly worrying part is inconsistent outputs from the same deterministic model, not that an LLM is bad at math.
A playful tangent explores “What’s moon plus sun?” with answers ranging from “bright” to “eclipse,” exomoons, tarot interpretations, and even language jokes—used to illustrate the ambiguity of natural‑language “math” versus strict arithmetic.

Related topics