The unexpected effectiveness of one-shot decompilation with Claude

LLMs as Decompilation and RE Tools

  • Many commenters report strong results using Claude and other LLMs (especially with Ghidra/IDA) to:
    • Clean up decompiled C, infer function purposes, and identify assembly tricks.
    • Comment JIT output or highly optimized/minified code, and compare compiler outputs.
  • Gemini is noted as also good at assembly and bytecode-level tasks; Codex is seen as more tuned for mainstream dev work.

Workflows, Heuristics, and Tooling

  • The post’s “headless loop + heuristics + compiler match” approach is praised as a concrete, useful pattern.
  • Key techniques:
    • Work function-by-function when possible; whole-file input is sometimes needed when registers are reused unpredictably.
    • Use a “give up after N attempts” heuristic to cap wasted tokens.
    • Exploit large context windows to analyze wide code regions and trace flows.
  • Some want more structured, step‑by‑step tutorials and tighter grammars for valid C, but others say simple “compile + feed errors back” loops are enough.

Limits, Complexity, and Non‑Expert Use

  • Commenters warn that one‑shot reverse engineering for non‑experts is still weak; you must give the model tight constraints, goals, and validation.
  • LLMs often misestimate task difficulty and duration—both over‑ and under‑shooting.
  • There’s debate over what “one‑shot” means (single prompt vs single example vs non-interactive loop).

Documentation and Developer Workflow

  • Many see LLMs as excellent for generating “how it works” docs, translating and synthesizing sparse or foreign‑language documentation.
  • Skepticism about auto‑invented rationales (“why it’s this way”); human review is desired.
  • Some argue LLMs reduce the need for human docs; others frame docs as an “error-correcting code” to detect mismatches between intent and implementation.

Legal, Licensing, and Privacy Concerns

  • Strong thread on distinctions between “open source” vs “source available” and how decompilations are derivative works with their own, but constrained, licensing.
  • Clean‑room reverse engineering is contrasted with distributing decompiled code.
  • Several raise concerns about uploading copyrighted binaries to cloud LLMs: potential evidence trails, DMCA/fair‑use ambiguity, and jurisdictional risks.

Decompilation, Obfuscation, and the Future of Software

  • Some speculate that near‑trivial decompilation could make most binaries effectively “source available,” provoking shifts to cloud‑only or hardware‑locked distribution.
  • Others expect counter‑moves: LLM‑assisted obfuscation or exotic schemes (e.g., homomorphic VMs) to make analysis harder.
  • There’s disagreement on timelines: some think “everything decompilable” is far off; others see it as inevitable and beneficial for preservation.

Game Preservation and Retro Computing

  • Multiple examples of LLM‑assisted ports and analysis: classic BIOSes, Prince of Persia on Apple II, and older PC/console games.
  • Matching original binaries requires reconstructing old toolchains and flags; flakiness and inter‑function dependencies often prevent 100% exact matches, but “99%+ matching, 100% functional” is common.