Claude Code Found a Linux Vulnerability Hidden for 23 Years
Effectiveness of LLMs for Vulnerability Discovery
- Many commenters see this as a genuine step change: modern coding models can now surface real, non-trivial bugs in large, complex codebases (Linux kernel, browsers, GhostScript, etc.).
- Several people report replicating the approach on production systems and getting real critical bugs, alongside duplicates, known/accepted risks, and non-exploitable issues.
- A key point: the big advance isn’t just “finding a bug” but chaining steps—locating a suspect pattern, reasoning about reachability, and even producing PoCs or tests.
Comparison to Static Analysis and Fuzzing
- Some argue traditional static analyzers could have found this kernel bug; others note those tools often drown teams in false positives or require deep expertise to run effectively.
- Static analysis/fuzzers tend to output raw crashers or hypothetical issues; LLM agents can layer on explanation, triage, exploitability reasoning, and test generation.
- There’s debate over whether LLM pipelines are a “superset” of fuzzing + static analysis or just another noisy scanner.
False Positives, Triage, and Workflows
- One camp claims AI-generated reports are mostly noise and that sorting them would take months.
- Others counter with recent data from kernel and other projects that AI-found bugs are now “mostly correct,” though volume forced maintainers to add reviewers.
- A common pattern: multi-stage pipelines where:
- First pass finds candidate bugs.
- Second pass (often another LLM) tries to reproduce, validate, and write tests/PoCs.
- Only validated findings reach humans.
Costs, Tokens, and Enterprise Concerns
- Individuals report modest costs (tens to hundreds of dollars) for deep audits; but exhaustive scanning of huge systems with top-tier models could run into six figures.
- Enterprise execs track AI spend closely and worry about scaling costs; they also navigate consumer vs commercial terms-of-service.
- Others argue the only meaningful metric is ROI vs human labor and the cost of missed vulnerabilities.
Open vs Closed Source and Security Landscape
- For popular OSS (Linux, etc.), LLMs likely saw much of the code during training, which may boost effectiveness.
- There’s disagreement over how well LLMs will work on decompiled/closed-source binaries, but several anecdotes suggest they can already reason surprisingly well over assembly/hex dumps.
- Some foresee an “avalanche” of 0-days in proprietary software; others stress that attackers and defenders both gain power.
Community Attitudes and Hype
- Thread shows a sharp split:
- Enthusiasts describe LLMs as “insanely good” recently, especially for code review and bug-hunting.
- Skeptics emphasize hallucinations, mediocre AI-generated code, and earlier experiences with spammy AI bug reports.
- Several note a cultural lag: many developers haven’t tried modern tools seriously, while maintainers have moved from banning AI slop to finding AI-assisted reports genuinely useful.