Project Glasswing: An Initial Update
Overall reaction to Mythos / Glasswing
- Many see Mythos as a genuine “step change” in AI‑assisted vulnerability discovery, citing:
- High reported true‑positive rates (~90%) versus traditional tools.
- Partner anecdotes (Firefox, Cloudflare, banks, etc.) and UK/third‑party evaluations showing strong offensive capability and end‑to‑end exploit generation.
- Others argue this is mostly marketing:
- Smaller or open‑weight models, with similar harnesses, reportedly reproduced Anthropic’s showcased findings.
- Some security practitioners report Mythos as “not obviously better” than other modern AI‑powered tools in their own codebases.
Model capability vs. harness and methodology
- Repeated theme: results depend heavily on the harness, prompts, and compute budget, not just the base model.
- Several point out that earlier runs with Opus 4.6 used weaker setups than Mythos, so headline “10x more bugs” claims may conflate model and methodology.
- People report good results with orchestrators (e.g., a strong cyber model directing many cheap sub‑agents) plus static analysis/fuzzing, suggesting Mythos‑like performance may be achievable with enough engineering and tokens.
Numbers, validation, and confusion
- Discussion scrutinizes Anthropic’s figures:
- 10k+ vulnerabilities vs. ~1.7k manually assessed vs. hundreds of published advisories; some find the math opaque.
- Confusion over “vulnerabilities” vs. CVEs vs. bugs, and over severity re‑ratings by Anthropic.
- Some fear double‑counting or rediscovery of already‑fixed issues; others note responsible disclosure timelines mean many details are intentionally withheld for now.
Cost, access, and incentives
- Mythos runs are described as extremely compute‑intensive and expensive per real vulnerability, with human triage and patching now the bottleneck.
- Glasswing limits access to select “systemically important” partners and (later) governments; this is seen both as:
- A safety measure (reduce widespread offensive use before patches).
- A business/IPO and compute‑rationing strategy, and a way to delay model distillation by competitors.
Security landscape and future of software
- Consensus: AI‑assisted tools (Mythos, Codex Security, others) already find large numbers of serious issues; attacks and defenses will both be super‑charged.
- Concern that:
- Well‑funded orgs will harden fast, while smaller and open‑source projects may be left exposed.
- Vendors may profit from models that both introduce bugs (via codegen) and sell scanners to fix them.
- Broader speculation about a future where most code is AI‑written, humans focus on review/architecture, and regulatory pressure may force automated scanning into release pipelines.