OpenAI is good at unminifying code

Capabilities and Use Cases

  • Many report LLMs are strong at “text transformations”: unminifying JS, renaming identifiers, reformatting, refactoring, and translating code between languages/frameworks.
  • People successfully use LLMs to:
    • Reverse-engineer minified JS and Shopify scripts.
    • Clean up and comment messy code, or explain legacy logic and “why” decisions were made.
    • Convert code across ecosystems (e.g., Python↔JS, AWS SDKs, CloudFormation↔Terraform↔CDK).
    • Extract structured data (CSV/JSON) from text and parse database schemas.
  • Some use models alongside decompilers (e.g., Ghidra, Binary Ninja) to assist reverse engineering of binaries or assembly, with mixed but promising results.

Minification vs. Decompilation / Obfuscation

  • Multiple commenters stress: unminifying JS (same language, mostly renames/formatting) is far easier than decompiling binaries or undoing true obfuscation.
  • LLMs still struggle with heavily obfuscated or “state-of-the-art” JS and complex compiled binaries.
  • There’s debate on how hard the inverse problem really is; some see minification inversion as relatively easy, others note that lost semantics (names, comments) are nontrivial to reconstruct.

Tooling and Techniques

  • Several tools are mentioned that combine ASTs and LLMs:
    • Workflows where traditional parsers ensure semantics while LLMs only suggest better names or comments.
    • Local-model modes exist but are slower and less accurate; API-based modes are faster but cost tokens.
  • Suggested patterns:
    • Use LLMs to rename variables per-scope, then apply deterministic renames via AST tooling.
    • Validate LLM transformations via unit tests, mutation testing, or AST equivalence checks.

Legal, Ethical, and Licensing Concerns

  • Strong disagreement over whether LLM-assisted decompilation could “render all code open source.”
  • Several point out: having source ≠ having rights; licenses and copyright still govern use and redistribution.
  • Clean-room reverse engineering is discussed; using decompiled/LLM-produced code directly is seen as risky, but using it only to write specs for a separate implementation may be acceptable in some jurisdictions (details flagged as jurisdiction-dependent and unclear).

Broader Implications and Skepticism

  • Some see this as a big unlock for reverse engineering, refactoring, and legacy software.
  • Others downplay novelty, noting that beautifiers and decompilers already exist, and LLM hallucinations and correctness remain major concerns.