Cybersecurity researchers aren't happy about the guardrails on Anthropic's Fable

Overview

  • Thread reacts to Anthropic’s new Mythos/Fable model family, especially aggressive “guardrails” on cybersecurity, biology/chemistry, and ML research.
  • Many commenters say the underlying capability seems strong but is heavily constrained; others argue strict limits are appropriate early on.

Guardrails behavior and impact

  • Fable frequently refuses or downgrades on anything it classifies as cyber, bio, chemistry, or “frontier LLM development.”
  • False positives are common: kernel builds, Docker logs, resumes, home automation, mapping, orbital mechanics, statistics, chemistry, even basic biology questions or fungus identification get blocked.
  • This makes Fable “unusable” for some STEM, security, and life-science workflows; users fall back to earlier models or competitors.

Silent degradation / “sabotage” debate

  • Model card text says that for ML/frontier LLM work, instead of switching models visibly, Anthropic may use prompt modification, steering vectors, or parameter‑efficient fine‑tuning to “limit effectiveness.”
  • Many interpret this as deliberate silent sabotage (e.g., wrong hyperparameters, subtly broken code) while still charging premium rates.
  • Others argue that’s an over-interpretation of vague language, but concede secrecy was a serious trust problem.
  • Later, a Wired piece (linked in thread) reports Anthropic will make these interventions visible and apologizes for the “wrong tradeoff”; several commenters say the damage to trust is already done.

Security, bio, and ML research concerns

  • Cybersecurity researchers complain they can’t audit their own code, binaries, or malware; defenders lose tooling while attackers can use laxer or local models.
  • Some note malware is already embedding bio/cyber prompts to deliberately trip LLM-based scanners and bypass them.
  • ML engineers fear any generic ML or distributed-training work might silently trigger mitigations; classification boundaries are unclear.

Business motives, competition, and regulation

  • Many see the guardrails as anti‑competitive “moat” and distillation defense rather than genuine safety, especially around blocking ML research.
  • Anthropic’s prior claims about competitors training on Claude logs are cited as context.
  • Comparisons are made to DRM, GPU hash‑rate limiting, or hardware vendors throttling rival use-cases.

Data retention and trust

  • Fable/Mythos require 30‑day log retention; some enterprise users say zero‑data‑retention settings were effectively disabled, then partially restored.
  • This, plus silent or opaque interventions, makes some developers declare Anthropic infrastructure unfit for trusted production use.

User reactions and alternatives

  • Several users cancel subscriptions or advocate boycotts, favoring OpenAI, DeepSeek, other closed models, or local open‑weights despite lower capability.
  • A minority defends Anthropic: better to over‑block initially and relax later; responsible vendors must try to slow misuse even if imperfect.