2024-11-10

Hackers use ZIP file concatenation to evade detection

Non-malicious and historical uses of ZIP concatenation

Technique predates the current wave of attacks: used for hybrid files (e.g., JPEG cover + ZIP of eBooks; ZIP in JPEG ICC profiles).
Some communities reportedly abandoned it after being abused for illegal content, leading platforms to block ZIP-looking images.
Related ideas go back to at least the 1990s (zip bombs, JAR/GIF hybrids).

Bypassing scanners in real-world workflows

Encrypted ZIPs are a long-standing way to evade email/content filters.
Workarounds include: embedding payloads in DOCX/XLSX (ZIP-based formats), base64-encoding binaries, and compress+split+encrypt pipelines (“shred/unshred”-style).
Corporate security often blocks “dangerous” extensions but allows opaque or split archives, leading to security theater while still being easy to bypass.

ZIP format ambiguity and parser behavior

Core issue: two structures (local file headers vs central directory) can disagree.
Some tools scan local headers; others treat the central directory as the sole source of truth. Behavior differs between WinRAR, 7-Zip, and Windows Explorer.
Debate over what the spec “really” intends:
- One side: only central directory entries are valid; extra headers are garbage except for recovery.
- Other side: spec implicitly allows “islands” of opaque data and append-only modification, for media spanning and streaming.
This ambiguity has already led to real vulnerabilities (e.g., hidden add-on files, APK modification without breaking signatures).
Several argue for a “strict ZIP” spec with explicit parsing rules.

Format design, splitting, and philosophy

Some criticize ZIP for violating single-source-of-truth principles, preferring simpler formats like tar (+ separate compression).
Others defend integrated features (central directory, file spanning) as historically necessary and still useful for large or unstable transfers.
There’s a Unix-style argument for separating archiving, compression, and sharding vs a pragmatic argument for combining them for random access and usability.

Defensive strategies and limitations

Suggested defenses include recursive unpacking vs simply rejecting “weird” archives that don’t match a straightforward forward-scan/central-directory view.
Some warn that making tools “smart” (deep recursive unpacking, auto-processing) increases attack surface; only AV should unpack deeply, regular tools should stay “dumb.”
Email/HTTP perimeter scanning is justified as defense in depth, but multiple commenters note that trivial transformations (encryption, base64, simple XOR/ROT) already defeat signature-based detection.
VirusTotal and many AV products reportedly struggle with nested archives and complex ZIP structures, often for performance reasons.

Related topics