Claude Code Found a Linux Vulnerability Hidden for 23 Years

Effectiveness of LLMs for Vulnerability Discovery

  • Many commenters see this as a genuine step change: modern coding models can now surface real, non-trivial bugs in large, complex codebases (Linux kernel, browsers, GhostScript, etc.).
  • Several people report replicating the approach on production systems and getting real critical bugs, alongside duplicates, known/accepted risks, and non-exploitable issues.
  • A key point: the big advance isn’t just “finding a bug” but chaining steps—locating a suspect pattern, reasoning about reachability, and even producing PoCs or tests.

Comparison to Static Analysis and Fuzzing

  • Some argue traditional static analyzers could have found this kernel bug; others note those tools often drown teams in false positives or require deep expertise to run effectively.
  • Static analysis/fuzzers tend to output raw crashers or hypothetical issues; LLM agents can layer on explanation, triage, exploitability reasoning, and test generation.
  • There’s debate over whether LLM pipelines are a “superset” of fuzzing + static analysis or just another noisy scanner.

False Positives, Triage, and Workflows

  • One camp claims AI-generated reports are mostly noise and that sorting them would take months.
  • Others counter with recent data from kernel and other projects that AI-found bugs are now “mostly correct,” though volume forced maintainers to add reviewers.
  • A common pattern: multi-stage pipelines where:
    • First pass finds candidate bugs.
    • Second pass (often another LLM) tries to reproduce, validate, and write tests/PoCs.
    • Only validated findings reach humans.

Costs, Tokens, and Enterprise Concerns

  • Individuals report modest costs (tens to hundreds of dollars) for deep audits; but exhaustive scanning of huge systems with top-tier models could run into six figures.
  • Enterprise execs track AI spend closely and worry about scaling costs; they also navigate consumer vs commercial terms-of-service.
  • Others argue the only meaningful metric is ROI vs human labor and the cost of missed vulnerabilities.

Open vs Closed Source and Security Landscape

  • For popular OSS (Linux, etc.), LLMs likely saw much of the code during training, which may boost effectiveness.
  • There’s disagreement over how well LLMs will work on decompiled/closed-source binaries, but several anecdotes suggest they can already reason surprisingly well over assembly/hex dumps.
  • Some foresee an “avalanche” of 0-days in proprietary software; others stress that attackers and defenders both gain power.

Community Attitudes and Hype

  • Thread shows a sharp split:
    • Enthusiasts describe LLMs as “insanely good” recently, especially for code review and bug-hunting.
    • Skeptics emphasize hallucinations, mediocre AI-generated code, and earlier experiences with spammy AI bug reports.
  • Several note a cultural lag: many developers haven’t tried modern tools seriously, while maintainers have moved from banning AI slop to finding AI-assisted reports genuinely useful.