Extracting AI models from mobile apps
Role of resize_to_320.tflite and basic ML details
- Commenters note a
.tflitefile that only does image resizing via standard TensorFlow ops, not an “AI model” for resizing. - Size (~7.7 KB) implies almost no learned weights.
- Clarifies that TensorFlow is a general compute framework; many vision models require fixed low‑resolution inputs.
Status of AI models as intellectual property
- Strong debate over whether model weights are copyrightable or just “facts”/coefficients produced mechanically.
- Some argue:
- Copyright generally requires human authorship; automated weights may not qualify.
- Weights may be better treated as trade secrets, or protected via contracts and licenses.
- Training-set curation and model implementations are clearly copyrightable; architectures may be patentable.
- Others counter:
- Models are licensed (e.g., LLaMA, Stable Diffusion, banknote‑net), implying they’re treated as IP.
- Compilations and compiled code are copyrighted even if produced by automated tools, suggesting an analogy for weights.
- Consensus: legal status of model weights is unclear and largely untested in court.
DMCA, circumvention, and legality of extraction
- DMCA §1201: circumventing effective access controls can be illegal even without redistribution, but only for works protected by copyright.
- Discussion of broad interpretations (any copy‑prevention scheme) vs case law limiting DMCA to actual copyrighted works.
- Extracting models via reverse‑engineering tools may produce illegal “circumvention tools” in some cases; legality is unsettled and jurisdiction‑dependent.
Training on copyrighted data vs claiming IP on models
- Many criticize big AI firms for training on unlicensed copyrighted data while asserting strong IP over resulting models (“rules for thee, not for me”).
- Disagreement over whether training is fair use:
- Pro side: highly transformative, analogous to learning; weights are statistics over many works.
- Con side: models can regurgitate training data, can undermine creators’ livelihoods, and scale content production massively.
- Some say if models are protected, training on copyrighted data should not simultaneously be fair use; others separate those questions.
Model “laundering” and distillation
- Techniques like model distillation and training on synthetic/model‑generated data are common; could be used to avoid direct copying of proprietary weights.
- Legal treatment of such derivative models is unclear.
On‑device models, extraction risk, and DRM
- General principle: anything shipped to a user device can be extracted with enough effort; mobile apps are not a secure place for “secret sauce”.
- Frida is highlighted as a powerful dynamic instrumentation tool; approach extends to recovering tokenizers and pre/post‑processing by observing framework calls.
- Ideas for protection:
- Encrypt models for specific inference runtimes (e.g., CoreML with public/private keys).
- Use GPU/TEE/DRM‑style secure hardware so decrypted data never leaves the device’s protected area.
- Counterpoint: given physical access, skilled attackers can still use hardware attacks (fault injection, power analysis, etc.); any device that must run matrix multiplies on decrypted data is ultimately attackable.
Cloud vs on‑device inference
- Hosting models remotely (e.g., via Firebase) avoids shipping them but introduces:
- Ongoing compute costs, latency, and bandwidth use.
- Loss of offline functionality.
- Hybrid schemes (partial cloud, partial device) are discussed as possible but technically complex.
Use of open models in the example
- The extracted banknote recognition model used as the demo is publicly available, trained on open data, and MIT/CDLA‑licensed; commenters see this as a safe and illustrative target.
- Some speculate this choice avoids demonstrating the technique on truly proprietary models.
Community reception and educational value
- Many appreciate the article as an accessible intro to Frida and mobile reverse engineering, especially for newer ML engineers or security‑curious readers.
- Others downplay novelty but agree it effectively illustrates that “what runs on your device can be recovered.”