Glassworm is back: A new wave of invisible Unicode attacks hits repositories

Detection & Mitigation Ideas

  • Many argue this class of attack is easy to detect mechanically:
    • Simple grep regexes for zero‑width/variation selector code points are suggested.
    • Lint rules or AV rules could flag any occurrence of eval() and non‑printing characters.
    • Some teams already enforce “ASCII-only source” or “no Unicode in code” via linters/hooks.
  • Others prefer a narrower blocklist: e.g., flag variation selectors and zero‑width characters specifically in source files while still allowing Unicode in resources/docs.
  • Pre‑commit hooks and CI checks are proposed as language‑agnostic defenses.

Responsibility of Platforms & Tools

  • Strong view that GitHub (and similar platforms/editors) should:
    • Highlight invisible characters in diffs and code views.
    • Provide built‑in scanning similar to secret scanning.
  • Debate over whether this is a moral “responsibility” vs just “good product design,” but broad agreement it would improve safety.
  • One commenter notes GitHub already advertises a warning for hidden Unicode, but it reportedly fails in some cases; a bug bounty confirmed the issue but was deemed low‑risk to fix.

Eval and Code Review Practices

  • Widespread agreement that eval() is almost always a red flag and should trigger heightened scrutiny or be banned by policy.
  • Some note rare legitimate uses, but still treat it as a “live bomb.”
  • Example code shows how to evade simple eval keyword searches using the Function constructor and obfuscation.
  • Several criticize maintainers merging code with opaque transforms and eval, though others point out that in at least one highlighted repo the likely vector was stolen credentials and a malicious force‑push, not a reviewed PR.

Debate on Unicode in Source Code

  • One camp argues invisible characters and visually confusable code points are design mistakes; they advocate ASCII‑only source or whitelisting a tiny visible subset.
  • The opposing camp counters that invisible characters are essential for real‑world writing systems (RTL scripts, ligatures, word breaks, Hangul, Mongolian, etc.) and that Unicode’s goal is semantic, not purely visual.
  • A compromise position suggests: keep full Unicode for text in general, but treat many invisible or presentation‑only characters as inherently suspect in source code and flag them by default.

How Serious is the Threat?

  • Some think the danger is overstated: an eval() on an “empty” string already looks suspicious, regardless of invisible payload.
  • Others stress that invisible characters underpin broader attack classes (direction overrides, lookalikes, escape sequences) and that relying solely on human review is demonstrably insufficient; automated checks and better tooling are seen as necessary.