2026-03-01

Inside the M4 Apple Neural Engine, Part 1: Reverse Engineering

Usefulness of the Neural Engine (ANE)

Many comments emphasize that ANE is already heavily used for on-device ML: image and text recognition, Photos and video apps, ARKit, FaceID, spam detection, audio transcription, on-device Siri, captions, and image manipulation.
Some users say they never use these features (or Siri), so ANE feels like wasted silicon for their workflows.
For typical Python/NumPy/sklearn users, ANE generally does not accelerate workloads automatically; NPUs are vendor-specific and rarely wired into open-source stacks.

CoreML, MLX, and Core AI

ANE access for custom models is via CoreML, not MLX; MLX currently targets CPU/GPU.
There’s confusion about scheduling: some say OS tasks have priority on ANE, others claim third-party workloads do.
A rumored “Core AI” framework may replace or supersede CoreML to better integrate third‑party LLMs and align with newer “AI” branding.

Reverse Engineering and AI Collaboration

The article’s ANE analysis was done with an LLM “collaborator,” which sparked debate.
Enthusiasts see this as a strong example of present-day “augmented engineering” and future reverse‑engineering workflows.
Skeptics distrust “vibe-coded” AI analysis, worry about hallucinations, and question how thoroughly facts were verified.
Others counter that humans also produce convincing but wrong work; AI just changes the failure modes.

Performance, Benchmarks, and Marketing Claims

Part 2’s benchmarks report ~6.6 TFLOPS/W and the ability for ANE to draw near‑zero power at idle.
Discussion notes Apple’s “38 TOPS INT8” figure relies on a convention (INT8 counted as 2× FP16), even though the hardware doesn’t actually run INT8 at twice the FP16 rate.
Some see this as typical marketing inflation; others blame disconnects between engineering and marketing.

Training on ANE and New Experiments

Commenters are curious whether ANE can be used for training; in principle inference hardware can, but efficiency is uncertain.
One contributor describes partially offloading NanoGPT training to ANE (classifier and softmax layers), reporting large speedups and fixes to memory leaks.

Apple’s Closed Design, Obfuscation, and Tooling

Apple’s closed ANE stack limits open-source use and MLX integration; some find this unsurprising given its role as a power-efficient inference engine for Apple’s own features.
There’s debate over how aggressively Apple obfuscates system code, with mentions of techniques like control-flow flattening and shared-cache packaging.
Several lament the decline in Apple’s developer documentation quality compared to earlier eras.

Related topics