2026-05-18

Project Glasswing: what Mythos showed us

Perceived capabilities of Mythos in security work

Several comments accept that Mythos is a qualitative upgrade for long, “agentic” security tasks, especially chaining small issues into real exploits.
Others note claims that the main change may be availability / always-on compute rather than a radically different base model.
There is confusion over whether Mythos is a cybersecurity‑specific model or a general‑purpose improvement; statements from different sources conflict and are called “unclear.”

Demand for concrete evidence and metrics

Multiple commenters criticize the Cloudflare post for lacking hard data: no counts of vulnerabilities found, severities, false positive rates, or time to triage.
They contrast this with more detailed writeups elsewhere (e.g., from a curl maintainer, Mozilla, and another vendor evaluation).
People explicitly ask: how many real issues did it find, how severe, and how many were already known?

Harness design and workflow

The blog’s main technical value is seen in its discussion of custom harnesses: narrow scopes, staged agents, and adversarial review.
Commenters agree that “scan this repo for bugs” works poorly; targeted prompts tied to specific functions, trust boundaries, and docs work much better.
Some argue this is obvious and not new; others think the “cluster of actors over structured context” pattern is more broadly useful beyond security.

Skepticism, marketing, and access politics

Many see the post as a lightly disguised advertisement for Anthropic and question why Cloudflare got deep access while open‑source projects only get mediated access or reports.
There’s ongoing distrust of closed, unreleased “frontier” models and of narratives about ultra‑powerful systems that can’t be shared.
Some predict there will be no mea culpa from those calling Mythos a stunt even if it proves effective.

Blog quality and AI authorship

Several believe the Cloudflare post was heavily LLM‑assisted or written, pointing to tone and phrasing.
Concerns: AI‑polished text can obscure which claims are truly owned, and widespread LLM‑style prose may homogenize writing and pollute future training data.
Others counter that organizations still choose to publish the text and are responsible for its substance.

Guardrails, alignment, and dual-use

Commenters are surprised that a security‑focused, gated model still inconsistently refuses legitimate research tasks (“emergent guardrails”).
Some report needing to prove legitimate code access before Mythos will proceed.
Many think long‑term guardrails against exploit generation are futile if near‑frontier open models become common.

Impact on software security practice

Expectations: Mythos‑class tools could dramatically lower the cost of finding and chaining exploits, especially in large, messy C/C++ codebases and enterprise code.
At the same time, memory‑unsafe projects appear to generate more false positives, increasing human triage load.
Auto‑patching by models is seen as risky; comments mention patches that fix one bug while silently breaking dependencies, especially in large multi‑repo systems.

Related topics