Project Glasswing: what Mythos showed us

Perceived capabilities of Mythos in security work

  • Several comments accept that Mythos is a qualitative upgrade for long, “agentic” security tasks, especially chaining small issues into real exploits.
  • Others note claims that the main change may be availability / always-on compute rather than a radically different base model.
  • There is confusion over whether Mythos is a cybersecurity‑specific model or a general‑purpose improvement; statements from different sources conflict and are called “unclear.”

Demand for concrete evidence and metrics

  • Multiple commenters criticize the Cloudflare post for lacking hard data: no counts of vulnerabilities found, severities, false positive rates, or time to triage.
  • They contrast this with more detailed writeups elsewhere (e.g., from a curl maintainer, Mozilla, and another vendor evaluation).
  • People explicitly ask: how many real issues did it find, how severe, and how many were already known?

Harness design and workflow

  • The blog’s main technical value is seen in its discussion of custom harnesses: narrow scopes, staged agents, and adversarial review.
  • Commenters agree that “scan this repo for bugs” works poorly; targeted prompts tied to specific functions, trust boundaries, and docs work much better.
  • Some argue this is obvious and not new; others think the “cluster of actors over structured context” pattern is more broadly useful beyond security.

Skepticism, marketing, and access politics

  • Many see the post as a lightly disguised advertisement for Anthropic and question why Cloudflare got deep access while open‑source projects only get mediated access or reports.
  • There’s ongoing distrust of closed, unreleased “frontier” models and of narratives about ultra‑powerful systems that can’t be shared.
  • Some predict there will be no mea culpa from those calling Mythos a stunt even if it proves effective.

Blog quality and AI authorship

  • Several believe the Cloudflare post was heavily LLM‑assisted or written, pointing to tone and phrasing.
  • Concerns: AI‑polished text can obscure which claims are truly owned, and widespread LLM‑style prose may homogenize writing and pollute future training data.
  • Others counter that organizations still choose to publish the text and are responsible for its substance.

Guardrails, alignment, and dual-use

  • Commenters are surprised that a security‑focused, gated model still inconsistently refuses legitimate research tasks (“emergent guardrails”).
  • Some report needing to prove legitimate code access before Mythos will proceed.
  • Many think long‑term guardrails against exploit generation are futile if near‑frontier open models become common.

Impact on software security practice

  • Expectations: Mythos‑class tools could dramatically lower the cost of finding and chaining exploits, especially in large, messy C/C++ codebases and enterprise code.
  • At the same time, memory‑unsafe projects appear to generate more false positives, increasing human triage load.
  • Auto‑patching by models is seen as risky; comments mention patches that fix one bug while silently breaking dependencies, especially in large multi‑repo systems.