2026-02-05

Opus 4.6 uncovers 500 zero-day flaws in open-source code

Verification and evidence

Many commenters ask whether the “500 zero-day” figure has been independently verified or published as CVEs with CVSS scores.
Anthropic’s public writeup is seen as marketing-heavy and example-light; people want a detailed paper with methodology, false-positive rates, and responses from affected projects.
Prior Anthropic claims (e.g., “Chinese APT” use of Claude) are cited as reasons to be extra skeptical of their security PR.

Value and limits of LLM-based vulnerability discovery

Some security practitioners in the thread say they’re not surprised; LLM-assisted vuln research has been progressing for years and fits the pattern-heavy, closed-loop nature of the work.
Others argue it’s unclear if these are genuinely “hard to find” bugs versus low-hanging fruit in very old, complex codebases.
A technical example from Ghostscript is noted: the model missed issues via general analysis but found one by reasoning over commit history and partial fixes.

OpenClaw tangent

There’s disagreement over whether finding ~100 bugs in OpenClaw is “enormous economic value.”
Several people hadn’t heard of OpenClaw and doubt its “massive adoption”; others argue that even unpopular or frivolous software is worth hardening if widely installed.

Terminology and bug-bounty “slop”

Multiple commenters complain that “zero-day” is used loosely; here it really just means “previously unknown vulnerability in shipping software.”
Maintainers’ experience with AI-generated bug reports (e.g., in curl) is raised: lots of low-quality “slop” from automated tools and LLMs swamping real findings.
Others counter that serious teams using AI analyzers have already submitted valid, high-impact bugs; the problem is amateurs, not the technique.

Trust, authority, and conflicts of interest

A long subthread debates whether to trust claims from well-known security experts who say these results are plausible.
Some see this as an argument from authority or note potential conflicts of interest (researchers employed by an LLM vendor); others argue expertise and track record are relevant evidence.

Broader implications and criticisms

Concerns include: LLMs introducing new vulnerabilities into code, uptime/availability if used in continuous security workflows, and the risk that attackers will use the same tools.
Some lament that Anthropic is touting defensive uses while restricting access for defenders, effectively keeping the offensive advantage in-house.
A few note this may slow “rewrite in safer language” efforts if LLMs can keep legacy C codebases limping along more safely.

Related topics