Glassworm is back: A new wave of invisible Unicode attacks hits repositories
Detection & Mitigation Ideas
- Many argue this class of attack is easy to detect mechanically:
- Simple grep regexes for zero‑width/variation selector code points are suggested.
- Lint rules or AV rules could flag any occurrence of
eval()and non‑printing characters. - Some teams already enforce “ASCII-only source” or “no Unicode in code” via linters/hooks.
- Others prefer a narrower blocklist: e.g., flag variation selectors and zero‑width characters specifically in source files while still allowing Unicode in resources/docs.
- Pre‑commit hooks and CI checks are proposed as language‑agnostic defenses.
Responsibility of Platforms & Tools
- Strong view that GitHub (and similar platforms/editors) should:
- Highlight invisible characters in diffs and code views.
- Provide built‑in scanning similar to secret scanning.
- Debate over whether this is a moral “responsibility” vs just “good product design,” but broad agreement it would improve safety.
- One commenter notes GitHub already advertises a warning for hidden Unicode, but it reportedly fails in some cases; a bug bounty confirmed the issue but was deemed low‑risk to fix.
Eval and Code Review Practices
- Widespread agreement that
eval()is almost always a red flag and should trigger heightened scrutiny or be banned by policy. - Some note rare legitimate uses, but still treat it as a “live bomb.”
- Example code shows how to evade simple
evalkeyword searches using theFunctionconstructor and obfuscation. - Several criticize maintainers merging code with opaque transforms and
eval, though others point out that in at least one highlighted repo the likely vector was stolen credentials and a malicious force‑push, not a reviewed PR.
Debate on Unicode in Source Code
- One camp argues invisible characters and visually confusable code points are design mistakes; they advocate ASCII‑only source or whitelisting a tiny visible subset.
- The opposing camp counters that invisible characters are essential for real‑world writing systems (RTL scripts, ligatures, word breaks, Hangul, Mongolian, etc.) and that Unicode’s goal is semantic, not purely visual.
- A compromise position suggests: keep full Unicode for text in general, but treat many invisible or presentation‑only characters as inherently suspect in source code and flag them by default.
How Serious is the Threat?
- Some think the danger is overstated: an
eval()on an “empty” string already looks suspicious, regardless of invisible payload. - Others stress that invisible characters underpin broader attack classes (direction overrides, lookalikes, escape sequences) and that relying solely on human review is demonstrably insufficient; automated checks and better tooling are seen as necessary.