2026-06-10

Cybersecurity researchers aren't happy about the guardrails on Anthropic's Fable

Overview

Thread reacts to Anthropic’s new Mythos/Fable model family, especially aggressive “guardrails” on cybersecurity, biology/chemistry, and ML research.
Many commenters say the underlying capability seems strong but is heavily constrained; others argue strict limits are appropriate early on.

Guardrails behavior and impact

Fable frequently refuses or downgrades on anything it classifies as cyber, bio, chemistry, or “frontier LLM development.”
False positives are common: kernel builds, Docker logs, resumes, home automation, mapping, orbital mechanics, statistics, chemistry, even basic biology questions or fungus identification get blocked.
This makes Fable “unusable” for some STEM, security, and life-science workflows; users fall back to earlier models or competitors.

Silent degradation / “sabotage” debate

Model card text says that for ML/frontier LLM work, instead of switching models visibly, Anthropic may use prompt modification, steering vectors, or parameter‑efficient fine‑tuning to “limit effectiveness.”
Many interpret this as deliberate silent sabotage (e.g., wrong hyperparameters, subtly broken code) while still charging premium rates.
Others argue that’s an over-interpretation of vague language, but concede secrecy was a serious trust problem.
Later, a Wired piece (linked in thread) reports Anthropic will make these interventions visible and apologizes for the “wrong tradeoff”; several commenters say the damage to trust is already done.

Security, bio, and ML research concerns

Cybersecurity researchers complain they can’t audit their own code, binaries, or malware; defenders lose tooling while attackers can use laxer or local models.
Some note malware is already embedding bio/cyber prompts to deliberately trip LLM-based scanners and bypass them.
ML engineers fear any generic ML or distributed-training work might silently trigger mitigations; classification boundaries are unclear.

Business motives, competition, and regulation

Many see the guardrails as anti‑competitive “moat” and distillation defense rather than genuine safety, especially around blocking ML research.
Anthropic’s prior claims about competitors training on Claude logs are cited as context.
Comparisons are made to DRM, GPU hash‑rate limiting, or hardware vendors throttling rival use-cases.

Data retention and trust

Fable/Mythos require 30‑day log retention; some enterprise users say zero‑data‑retention settings were effectively disabled, then partially restored.
This, plus silent or opaque interventions, makes some developers declare Anthropic infrastructure unfit for trusted production use.

User reactions and alternatives

Several users cancel subscriptions or advocate boycotts, favoring OpenAI, DeepSeek, other closed models, or local open‑weights despite lower capability.
A minority defends Anthropic: better to over‑block initially and relax later; responsible vendors must try to slow misuse even if imperfect.

Related topics