Inside the M4 Apple Neural Engine, Part 1: Reverse Engineering

Usefulness of the Neural Engine (ANE)

  • Many comments emphasize that ANE is already heavily used for on-device ML: image and text recognition, Photos and video apps, ARKit, FaceID, spam detection, audio transcription, on-device Siri, captions, and image manipulation.
  • Some users say they never use these features (or Siri), so ANE feels like wasted silicon for their workflows.
  • For typical Python/NumPy/sklearn users, ANE generally does not accelerate workloads automatically; NPUs are vendor-specific and rarely wired into open-source stacks.

CoreML, MLX, and Core AI

  • ANE access for custom models is via CoreML, not MLX; MLX currently targets CPU/GPU.
  • There’s confusion about scheduling: some say OS tasks have priority on ANE, others claim third-party workloads do.
  • A rumored “Core AI” framework may replace or supersede CoreML to better integrate third‑party LLMs and align with newer “AI” branding.

Reverse Engineering and AI Collaboration

  • The article’s ANE analysis was done with an LLM “collaborator,” which sparked debate.
  • Enthusiasts see this as a strong example of present-day “augmented engineering” and future reverse‑engineering workflows.
  • Skeptics distrust “vibe-coded” AI analysis, worry about hallucinations, and question how thoroughly facts were verified.
  • Others counter that humans also produce convincing but wrong work; AI just changes the failure modes.

Performance, Benchmarks, and Marketing Claims

  • Part 2’s benchmarks report ~6.6 TFLOPS/W and the ability for ANE to draw near‑zero power at idle.
  • Discussion notes Apple’s “38 TOPS INT8” figure relies on a convention (INT8 counted as 2× FP16), even though the hardware doesn’t actually run INT8 at twice the FP16 rate.
  • Some see this as typical marketing inflation; others blame disconnects between engineering and marketing.

Training on ANE and New Experiments

  • Commenters are curious whether ANE can be used for training; in principle inference hardware can, but efficiency is uncertain.
  • One contributor describes partially offloading NanoGPT training to ANE (classifier and softmax layers), reporting large speedups and fixes to memory leaks.

Apple’s Closed Design, Obfuscation, and Tooling

  • Apple’s closed ANE stack limits open-source use and MLX integration; some find this unsurprising given its role as a power-efficient inference engine for Apple’s own features.
  • There’s debate over how aggressively Apple obfuscates system code, with mentions of techniques like control-flow flattening and shared-cache packaging.
  • Several lament the decline in Apple’s developer documentation quality compared to earlier eras.