OpenAI is good at unminifying code
Capabilities and Use Cases
- Many report LLMs are strong at “text transformations”: unminifying JS, renaming identifiers, reformatting, refactoring, and translating code between languages/frameworks.
- People successfully use LLMs to:
- Reverse-engineer minified JS and Shopify scripts.
- Clean up and comment messy code, or explain legacy logic and “why” decisions were made.
- Convert code across ecosystems (e.g., Python↔JS, AWS SDKs, CloudFormation↔Terraform↔CDK).
- Extract structured data (CSV/JSON) from text and parse database schemas.
- Some use models alongside decompilers (e.g., Ghidra, Binary Ninja) to assist reverse engineering of binaries or assembly, with mixed but promising results.
Minification vs. Decompilation / Obfuscation
- Multiple commenters stress: unminifying JS (same language, mostly renames/formatting) is far easier than decompiling binaries or undoing true obfuscation.
- LLMs still struggle with heavily obfuscated or “state-of-the-art” JS and complex compiled binaries.
- There’s debate on how hard the inverse problem really is; some see minification inversion as relatively easy, others note that lost semantics (names, comments) are nontrivial to reconstruct.
Tooling and Techniques
- Several tools are mentioned that combine ASTs and LLMs:
- Workflows where traditional parsers ensure semantics while LLMs only suggest better names or comments.
- Local-model modes exist but are slower and less accurate; API-based modes are faster but cost tokens.
- Suggested patterns:
- Use LLMs to rename variables per-scope, then apply deterministic renames via AST tooling.
- Validate LLM transformations via unit tests, mutation testing, or AST equivalence checks.
Legal, Ethical, and Licensing Concerns
- Strong disagreement over whether LLM-assisted decompilation could “render all code open source.”
- Several point out: having source ≠ having rights; licenses and copyright still govern use and redistribution.
- Clean-room reverse engineering is discussed; using decompiled/LLM-produced code directly is seen as risky, but using it only to write specs for a separate implementation may be acceptable in some jurisdictions (details flagged as jurisdiction-dependent and unclear).
Broader Implications and Skepticism
- Some see this as a big unlock for reverse engineering, refactoring, and legacy software.
- Others downplay novelty, noting that beautifiers and decompilers already exist, and LLM hallucinations and correctness remain major concerns.