2026-03-15

Glassworm is back: A new wave of invisible Unicode attacks hits repositories

Detection & Mitigation Ideas

Many argue this class of attack is easy to detect mechanically:
- Simple grep regexes for zero‑width/variation selector code points are suggested.
- Lint rules or AV rules could flag any occurrence of eval() and non‑printing characters.
- Some teams already enforce “ASCII-only source” or “no Unicode in code” via linters/hooks.
Others prefer a narrower blocklist: e.g., flag variation selectors and zero‑width characters specifically in source files while still allowing Unicode in resources/docs.
Pre‑commit hooks and CI checks are proposed as language‑agnostic defenses.

Responsibility of Platforms & Tools

Strong view that GitHub (and similar platforms/editors) should:
- Highlight invisible characters in diffs and code views.
- Provide built‑in scanning similar to secret scanning.
Debate over whether this is a moral “responsibility” vs just “good product design,” but broad agreement it would improve safety.
One commenter notes GitHub already advertises a warning for hidden Unicode, but it reportedly fails in some cases; a bug bounty confirmed the issue but was deemed low‑risk to fix.

Eval and Code Review Practices

Widespread agreement that eval() is almost always a red flag and should trigger heightened scrutiny or be banned by policy.
Some note rare legitimate uses, but still treat it as a “live bomb.”
Example code shows how to evade simple eval keyword searches using the Function constructor and obfuscation.
Several criticize maintainers merging code with opaque transforms and eval, though others point out that in at least one highlighted repo the likely vector was stolen credentials and a malicious force‑push, not a reviewed PR.

Debate on Unicode in Source Code

One camp argues invisible characters and visually confusable code points are design mistakes; they advocate ASCII‑only source or whitelisting a tiny visible subset.
The opposing camp counters that invisible characters are essential for real‑world writing systems (RTL scripts, ligatures, word breaks, Hangul, Mongolian, etc.) and that Unicode’s goal is semantic, not purely visual.
A compromise position suggests: keep full Unicode for text in general, but treat many invisible or presentation‑only characters as inherently suspect in source code and flag them by default.

How Serious is the Threat?

Some think the danger is overstated: an eval() on an “empty” string already looks suspicious, regardless of invisible payload.
Others stress that invisible characters underpin broader attack classes (direction overrides, lookalikes, escape sequences) and that relying solely on human review is demonstrably insufficient; automated checks and better tooling are seen as necessary.

Related topics