The coming industrialisation of exploit generation with LLMs

Coexistence of Great Exploits and Garbage Bug Reports

  • Many argue both phenomena are real: LLMs can produce high‑quality exploits and nonsense reports.
  • Key distinction:
    • Naive use = “paste code into ChatGPT, ask for vulns” → hallucinated bugs, fake PoCs.
    • Structured “agent harness” use with execution + verification → working exploits.
  • Exploit quality is high when there’s:
    • A well-defined task and environment.
    • An automatic verifier (e.g., “did we spawn a shell / write this file?”).
  • Maintainers’ pain comes from people submitting unverified LLM findings, whereas researchers run thousands of attempts and only surface verified successes.

“Industrialisation” and Human Role

  • Some see a contradiction between “LLMs industrialise exploit generation with no human in the loop” and the clear need for experts to:
    • Design targets, environments, and verifiers.
    • Interpret results and build harnesses.
  • Defenders say the article overstates autonomy; expertise is embedded in harness design, even if not in each exploit attempt.
  • Others stress that once set up, you can scale to many agents in parallel, with humans only at setup and review stages.

Offense vs Defense Symmetry

  • One camp: tools are symmetric. Defenders can run “LLM red teams” in CI, like advanced fuzzing, and large orgs already do this.
  • Opposing view: asymmetry is fundamental:
    • Attackers need any exploitable bug; defenders must find and fix all relevant ones.
    • LLMs scale both sides (1→100 hackers), so relative advantage doesn’t improve for defenders.
    • Defenders also face business constraints (uptime, change control).

Technical Takeaways from the QuickJS Experiment

  • GPT‑5.2 reportedly chained multiple glibc exit handlers to write a file despite ASLR, NX, RELRO, CFI, shadow stack, and seccomp restrictions.
  • Some see this as evidence that hardened C binaries are still very exploitable by LLMs once a memory bug exists.
  • Others note the sandbox goal was limited (file write, not sandbox escape), and the mitigations were bypassed using known techniques, not novel breaks.
  • Debate shifts to language and deployment choices (C vs Rust/Go, reducing libc surface, static binaries, unikernels, formal verification).

Broader Security and Process Implications

  • Expectation that LLMs will:
    • Greatly lower the bar for “script kiddies on steroids.”
    • Also pressure vendors to properly implement mitigations and adopt more formal/spec-based verification.
  • Several commenters recommend:
    • Treating random downloads and extensions as increasingly dangerous.
    • Using LLMs defensively to analyze suspicious repos and code, while remaining wary of their own failure modes (prompt injection, bad fix suggestions).