2026-06-16

Feds freaked over Fable 5 after 'fix this code', not jailbreak, say researchers

What the “fix this code” behavior actually was

Fable was designed to hand off “cybersecurity-related” prompts to a weaker model, but still happily improved code when asked to “fix this code.”
Reviewers say it fixed real and planted vulnerabilities and wrote tests; diffs or tests could reveal exploits.
Many argue this is just normal defensive workflow (find–fix–test) and not a clever jailbreak; others note that bypassing guardrails by rephrasing is, in practice, a jailbreak.

Defense vs offense in cybersecurity

Multiple comments stress that the same skills and tools find bugs for both defense and attack; trying to separate “defensive” and “offensive” capabilities is seen as conceptually unsound.
Some worry export controls effectively stop defenders from using the best tools while motivated attackers can still get them (via other models, stolen identities, or stolen weights).
Others argue that even imperfect guardrails that raise costs (more tokens, more friction, human review) still meaningfully slow large‑scale offensive use.

Anthropic’s strategy and AI guardrails

Many say Anthropic’s months of “this is like nukes” messaging made a clampdown politically inevitable once any flaw was found.
There’s broad skepticism that “bulletproof” safety filters are possible; LLMs can’t reliably infer intent, are easily lied to, and classification tends to be brittle or overbroad.
Some suggest more sophisticated classifiers, world models, or credential-based access to sensitive code, but others call these unworkable or harmful to legitimate open‑source work.

Political and regulatory interpretations

Large contingent sees the ban as political retaliation or a shakedown ahead of Anthropic’s IPO, not genuine safety policy.
Others frame it as early steps toward centralized, possibly authoritarian control over “frontier” AI, with parallels drawn to export controls, crypto backdoor fights, and national‑security logic.
A few commenters doubt the official narrative altogether and suggest the episode may be partly staged or opportunistic PR.

Implications for users, markets, and open models

Developers are frustrated: the same system that helps create bugs (earlier models) may now be barred from fixing them at scale.
Businesses worry about relying on models that can be shut off overnight for unrelated political reasons.
Several predict increased interest in open‑weight models and non‑US providers, and even a possible “intelligence cap” for consumer AI while stronger systems become restricted like weapons.

Related topics