Hackers use ZIP file concatenation to evade detection

Non-malicious and historical uses of ZIP concatenation

  • Technique predates the current wave of attacks: used for hybrid files (e.g., JPEG cover + ZIP of eBooks; ZIP in JPEG ICC profiles).
  • Some communities reportedly abandoned it after being abused for illegal content, leading platforms to block ZIP-looking images.
  • Related ideas go back to at least the 1990s (zip bombs, JAR/GIF hybrids).

Bypassing scanners in real-world workflows

  • Encrypted ZIPs are a long-standing way to evade email/content filters.
  • Workarounds include: embedding payloads in DOCX/XLSX (ZIP-based formats), base64-encoding binaries, and compress+split+encrypt pipelines (“shred/unshred”-style).
  • Corporate security often blocks “dangerous” extensions but allows opaque or split archives, leading to security theater while still being easy to bypass.

ZIP format ambiguity and parser behavior

  • Core issue: two structures (local file headers vs central directory) can disagree.
  • Some tools scan local headers; others treat the central directory as the sole source of truth. Behavior differs between WinRAR, 7-Zip, and Windows Explorer.
  • Debate over what the spec “really” intends:
    • One side: only central directory entries are valid; extra headers are garbage except for recovery.
    • Other side: spec implicitly allows “islands” of opaque data and append-only modification, for media spanning and streaming.
  • This ambiguity has already led to real vulnerabilities (e.g., hidden add-on files, APK modification without breaking signatures).
  • Several argue for a “strict ZIP” spec with explicit parsing rules.

Format design, splitting, and philosophy

  • Some criticize ZIP for violating single-source-of-truth principles, preferring simpler formats like tar (+ separate compression).
  • Others defend integrated features (central directory, file spanning) as historically necessary and still useful for large or unstable transfers.
  • There’s a Unix-style argument for separating archiving, compression, and sharding vs a pragmatic argument for combining them for random access and usability.

Defensive strategies and limitations

  • Suggested defenses include recursive unpacking vs simply rejecting “weird” archives that don’t match a straightforward forward-scan/central-directory view.
  • Some warn that making tools “smart” (deep recursive unpacking, auto-processing) increases attack surface; only AV should unpack deeply, regular tools should stay “dumb.”
  • Email/HTTP perimeter scanning is justified as defense in depth, but multiple commenters note that trivial transformations (encryption, base64, simple XOR/ROT) already defeat signature-based detection.
  • VirusTotal and many AV products reportedly struggle with nested archives and complex ZIP structures, often for performance reasons.