Project Glasswing: what Mythos showed us
Perceived capabilities of Mythos in security work
- Several comments accept that Mythos is a qualitative upgrade for long, “agentic” security tasks, especially chaining small issues into real exploits.
- Others note claims that the main change may be availability / always-on compute rather than a radically different base model.
- There is confusion over whether Mythos is a cybersecurity‑specific model or a general‑purpose improvement; statements from different sources conflict and are called “unclear.”
Demand for concrete evidence and metrics
- Multiple commenters criticize the Cloudflare post for lacking hard data: no counts of vulnerabilities found, severities, false positive rates, or time to triage.
- They contrast this with more detailed writeups elsewhere (e.g., from a curl maintainer, Mozilla, and another vendor evaluation).
- People explicitly ask: how many real issues did it find, how severe, and how many were already known?
Harness design and workflow
- The blog’s main technical value is seen in its discussion of custom harnesses: narrow scopes, staged agents, and adversarial review.
- Commenters agree that “scan this repo for bugs” works poorly; targeted prompts tied to specific functions, trust boundaries, and docs work much better.
- Some argue this is obvious and not new; others think the “cluster of actors over structured context” pattern is more broadly useful beyond security.
Skepticism, marketing, and access politics
- Many see the post as a lightly disguised advertisement for Anthropic and question why Cloudflare got deep access while open‑source projects only get mediated access or reports.
- There’s ongoing distrust of closed, unreleased “frontier” models and of narratives about ultra‑powerful systems that can’t be shared.
- Some predict there will be no mea culpa from those calling Mythos a stunt even if it proves effective.
Blog quality and AI authorship
- Several believe the Cloudflare post was heavily LLM‑assisted or written, pointing to tone and phrasing.
- Concerns: AI‑polished text can obscure which claims are truly owned, and widespread LLM‑style prose may homogenize writing and pollute future training data.
- Others counter that organizations still choose to publish the text and are responsible for its substance.
Guardrails, alignment, and dual-use
- Commenters are surprised that a security‑focused, gated model still inconsistently refuses legitimate research tasks (“emergent guardrails”).
- Some report needing to prove legitimate code access before Mythos will proceed.
- Many think long‑term guardrails against exploit generation are futile if near‑frontier open models become common.
Impact on software security practice
- Expectations: Mythos‑class tools could dramatically lower the cost of finding and chaining exploits, especially in large, messy C/C++ codebases and enterprise code.
- At the same time, memory‑unsafe projects appear to generate more false positives, increasing human triage load.
- Auto‑patching by models is seen as risky; comments mention patches that fix one bug while silently breaking dependencies, especially in large multi‑repo systems.