2025-07-06

I extracted the safety filters from Apple Intelligence models

What these filters are and where they sit

Extracted configs are regex‑style block/replace lists used around Apple’s “Apple Intelligence” models.
Commenters say they’re an extra, cheap first layer before a heavier “safety model”/classifier runs, on both input and output.
Different files map to specific features: proactive notification summaries, Writing Tools, camera “visual intelligence,” messages/mail replies, code intelligence, etc.
Some lists are “retain”/substitution lists (replacing a term with “test complete”), others are hard denies that disable the feature (“Writing tools unavailable”).

Test phrases and QA scaffolding

Odd phrases like “granular mango serpent” and “xylophone copious opportunity defined elephant” (XCODE acronym) appear widely.
Consensus: these are artificial, low‑collision QA tokens used to test that filters are loaded and working, analogous to antivirus test strings.
Confirmed behavior: using the phrase in Apple Intelligence triggers blocked‑content errors, supporting the “QA hook” theory.

Regex safety: utility and limitations

Some see regex filters as “silly” and trivially bypassed (e.g., leetspeak, euphemisms), others defend them as fast, effective for 99% of ordinary users, and good CYA.
Classic problems identified: false positives like Scunthorpe‑style matches, blocking benign phrases (“pass on,” “take it off me”), and missing coded language (“unalive”).
Several argue that LLMs easily normalize typos and substitutions, so naive regexes neither robustly block nor meaningfully degrade harmful use.

Politics, brands, and topic avoidance

Lists explicitly block many current politicians’ names, some political topics (e.g., Palestine in certain contexts), competitor AI brand names (ChatGPT, Gemini, others), and some welfare/poverty terms in French.
Interpretations range from neutral “avoid generating abusive or defamatory replies about named individuals” to concern about opaque political and socioeconomic framing.
Apple product names and capitalization are enforced (iPhone, etc.), seen by some as trivial trademark defense, by others as branding overreach into user expression.

Regional and “safety vs censorship” debate

CN‑specific configs emphasize sexual deviance, religion, and some political/religious terms; other locales vary by language and local politics.
Large subthread debates whether this is ordinary corporate risk management and legal compliance, or a step toward corporate/state speech control akin to national firewalls.
Some point to open‑weights/offline models as an escape valve; others note most users will be stuck with whatever guardrails platform vendors impose.

Related topics