2025-11-28

The unexpected effectiveness of one-shot decompilation with Claude

LLMs as Decompilation and RE Tools

Many commenters report strong results using Claude and other LLMs (especially with Ghidra/IDA) to:
- Clean up decompiled C, infer function purposes, and identify assembly tricks.
- Comment JIT output or highly optimized/minified code, and compare compiler outputs.
Gemini is noted as also good at assembly and bytecode-level tasks; Codex is seen as more tuned for mainstream dev work.

Workflows, Heuristics, and Tooling

The post’s “headless loop + heuristics + compiler match” approach is praised as a concrete, useful pattern.
Key techniques:
- Work function-by-function when possible; whole-file input is sometimes needed when registers are reused unpredictably.
- Use a “give up after N attempts” heuristic to cap wasted tokens.
- Exploit large context windows to analyze wide code regions and trace flows.
Some want more structured, step‑by‑step tutorials and tighter grammars for valid C, but others say simple “compile + feed errors back” loops are enough.

Limits, Complexity, and Non‑Expert Use

Commenters warn that one‑shot reverse engineering for non‑experts is still weak; you must give the model tight constraints, goals, and validation.
LLMs often misestimate task difficulty and duration—both over‑ and under‑shooting.
There’s debate over what “one‑shot” means (single prompt vs single example vs non-interactive loop).

Documentation and Developer Workflow

Many see LLMs as excellent for generating “how it works” docs, translating and synthesizing sparse or foreign‑language documentation.
Skepticism about auto‑invented rationales (“why it’s this way”); human review is desired.
Some argue LLMs reduce the need for human docs; others frame docs as an “error-correcting code” to detect mismatches between intent and implementation.

Legal, Licensing, and Privacy Concerns

Strong thread on distinctions between “open source” vs “source available” and how decompilations are derivative works with their own, but constrained, licensing.
Clean‑room reverse engineering is contrasted with distributing decompiled code.
Several raise concerns about uploading copyrighted binaries to cloud LLMs: potential evidence trails, DMCA/fair‑use ambiguity, and jurisdictional risks.

Decompilation, Obfuscation, and the Future of Software

Some speculate that near‑trivial decompilation could make most binaries effectively “source available,” provoking shifts to cloud‑only or hardware‑locked distribution.
Others expect counter‑moves: LLM‑assisted obfuscation or exotic schemes (e.g., homomorphic VMs) to make analysis harder.
There’s disagreement on timelines: some think “everything decompilable” is far off; others see it as inevitable and beneficial for preservation.

Game Preservation and Retro Computing

Multiple examples of LLM‑assisted ports and analysis: classic BIOSes, Prince of Persia on Apple II, and older PC/console games.
Matching original binaries requires reconstructing old toolchains and flags; flakiness and inter‑function dependencies often prevent 100% exact matches, but “99%+ matching, 100% functional” is common.

Related topics