Project Glasswing: An Initial Update

Overall reaction to Mythos / Glasswing

  • Many see Mythos as a genuine “step change” in AI‑assisted vulnerability discovery, citing:
    • High reported true‑positive rates (~90%) versus traditional tools.
    • Partner anecdotes (Firefox, Cloudflare, banks, etc.) and UK/third‑party evaluations showing strong offensive capability and end‑to‑end exploit generation.
  • Others argue this is mostly marketing:
    • Smaller or open‑weight models, with similar harnesses, reportedly reproduced Anthropic’s showcased findings.
    • Some security practitioners report Mythos as “not obviously better” than other modern AI‑powered tools in their own codebases.

Model capability vs. harness and methodology

  • Repeated theme: results depend heavily on the harness, prompts, and compute budget, not just the base model.
  • Several point out that earlier runs with Opus 4.6 used weaker setups than Mythos, so headline “10x more bugs” claims may conflate model and methodology.
  • People report good results with orchestrators (e.g., a strong cyber model directing many cheap sub‑agents) plus static analysis/fuzzing, suggesting Mythos‑like performance may be achievable with enough engineering and tokens.

Numbers, validation, and confusion

  • Discussion scrutinizes Anthropic’s figures:
    • 10k+ vulnerabilities vs. ~1.7k manually assessed vs. hundreds of published advisories; some find the math opaque.
    • Confusion over “vulnerabilities” vs. CVEs vs. bugs, and over severity re‑ratings by Anthropic.
  • Some fear double‑counting or rediscovery of already‑fixed issues; others note responsible disclosure timelines mean many details are intentionally withheld for now.

Cost, access, and incentives

  • Mythos runs are described as extremely compute‑intensive and expensive per real vulnerability, with human triage and patching now the bottleneck.
  • Glasswing limits access to select “systemically important” partners and (later) governments; this is seen both as:
    • A safety measure (reduce widespread offensive use before patches).
    • A business/IPO and compute‑rationing strategy, and a way to delay model distillation by competitors.

Security landscape and future of software

  • Consensus: AI‑assisted tools (Mythos, Codex Security, others) already find large numbers of serious issues; attacks and defenses will both be super‑charged.
  • Concern that:
    • Well‑funded orgs will harden fast, while smaller and open‑source projects may be left exposed.
    • Vendors may profit from models that both introduce bugs (via codegen) and sell scanners to fix them.
  • Broader speculation about a future where most code is AI‑written, humans focus on review/architecture, and regulatory pressure may force automated scanning into release pipelines.