WorstFit: Unveiling Hidden Transformers in Windows ANSI
Overall reaction & nature of the issue
- Many see the vulnerability as unsurprising given Windows’ legacy layers, but still eye‑opening in how multiple “harmless” features combine into serious exploits.
- Core problem: Windows “ANSI” APIs use a “best‑fit” Unicode→codepage mapping that silently turns certain Unicode characters into ASCII metacharacters (
",\,/,-, etc.) after an application has validated input. - This breaks security assumptions in argument handling, shell escaping, path validation, etc., especially when wide‑string logic and ANSI APIs are mixed.
ANSI vs Unicode on Windows
- Strong consensus: new code should avoid
*A(ANSI) Win32 APIs and use*W(wide) variants plus explicit conversion. - Several note that Microsoft has recommended wide APIs since early NT, but its own C runtime historically routes
fopen,getenv,argv, etc. through*A, perpetuating best‑fit issues. - Some argue for simply killing best‑fit or mapping unrepresentable chars to a harmless placeholder and/or failing early.
UTF‑8 codepage and manifests
- Windows now allows opting into UTF‑8 as the “ANSI” codepage via manifests or a system‑wide “Beta: UTF‑8” checkbox.
- Experiences differ: some report years of smooth use; others saw random app crashes, especially with legacy software assuming fixed 1‑byte‑per‑char encodings or limited buffer growth.
- Debate whether this is a good general solution:
- Pro: aligns Windows with Unix/UTF‑8, simplifies portable C/C++ and CLI tools.
- Con: doesn’t handle invalid UTF‑16 from Win32 (WTF‑16) cleanly, can break unknown DLLs using
*A, and still risks information loss.
Impact on languages, runtimes, and tools
- Rust’s standard library mostly uses wide APIs (
GetCommandLineW, etc.) and bypassesargv, so the described attacks don’t directly hit Rust binaries; child processes that use ANSI APIs remain at risk. - Cygwin was initially suspected vulnerable via internal use of NT conversion routines, but maintainers clarify they parse the wide command line themselves, mitigating worst‑fit.
- curl and other cross‑platform tools: tension between “they’re victims of the platform” and “it’s still their bug on Windows.” Some say serious, common issues would be fixed regardless; others stress unpaid maintainers and platform complexity.
Process spawning & argument parsing
- Windows fundamentally passes a single command‑line string;
argvis a user‑space convention, and multiple runtimes (C, Go, Java, Python, etc.) parse it differently. - Because you can’t know how the callee parses arguments, commenters claim there is no universal, safe escaping scheme on Windows—only program‑specific ones.
- Suggestions include:
- Use wide APIs end‑to‑end and convert to UTF‑8/WTF‑8 internally.
- Avoid Windows
system()‑style command construction; prefer direct APIs or tightly specified argument parsing. - For some high‑level languages, fail or warn on dangerous characters in subprocess args by default (controversial due to i18n needs).
Portability and encoding philosophy
- Long back‑and‑forth on whether Windows should fully embrace UTF‑8 vs keeping UTF‑16/WTF‑16 as the “native” encoding:
- One camp: UTF‑8 has effectively “won”; Unix dominance on servers and portability concerns make UTF‑8 the only practical choice.
- Other camp: Windows internals and filesystems are 16‑bit‑unit based, can store invalid sequences, and require careful WTF‑16/WTF‑8 handling; blindly UTF‑8‑ifying
*AAPIs is fragile.
- Several emphasize that many of these attacks are manifestations of already‑existing Unicode handling bugs in applications, only now exposed more clearly.
Microsoft’s compatibility stance
- Commenters note Microsoft’s deep commitment to backward compatibility: e.g., trigraphs, ancient games, case‑insensitive filesystem behavior, legacy CRTs, and old codepages that still work.
- Some argue security should justify breaking changes (e.g., disabling best‑fit, making UTF‑8 default), with shims or API versioning for old apps.
- Others think staged opt‑ins via manifests, code‑analysis rules (e.g., discouraging best‑fit), and better documentation/linting are more realistic than a hard global switch.