2025-01-09

WorstFit: Unveiling Hidden Transformers in Windows ANSI

Overall reaction & nature of the issue

Many see the vulnerability as unsurprising given Windows’ legacy layers, but still eye‑opening in how multiple “harmless” features combine into serious exploits.
Core problem: Windows “ANSI” APIs use a “best‑fit” Unicode→codepage mapping that silently turns certain Unicode characters into ASCII metacharacters (", \, /, -, etc.) after an application has validated input.
This breaks security assumptions in argument handling, shell escaping, path validation, etc., especially when wide‑string logic and ANSI APIs are mixed.

ANSI vs Unicode on Windows

Strong consensus: new code should avoid *A (ANSI) Win32 APIs and use *W (wide) variants plus explicit conversion.
Several note that Microsoft has recommended wide APIs since early NT, but its own C runtime historically routes fopen, getenv, argv, etc. through *A, perpetuating best‑fit issues.
Some argue for simply killing best‑fit or mapping unrepresentable chars to a harmless placeholder and/or failing early.

UTF‑8 codepage and manifests

Windows now allows opting into UTF‑8 as the “ANSI” codepage via manifests or a system‑wide “Beta: UTF‑8” checkbox.
Experiences differ: some report years of smooth use; others saw random app crashes, especially with legacy software assuming fixed 1‑byte‑per‑char encodings or limited buffer growth.
Debate whether this is a good general solution:
- Pro: aligns Windows with Unix/UTF‑8, simplifies portable C/C++ and CLI tools.
- Con: doesn’t handle invalid UTF‑16 from Win32 (WTF‑16) cleanly, can break unknown DLLs using *A, and still risks information loss.

Impact on languages, runtimes, and tools

Rust’s standard library mostly uses wide APIs (GetCommandLineW, etc.) and bypasses argv, so the described attacks don’t directly hit Rust binaries; child processes that use ANSI APIs remain at risk.
Cygwin was initially suspected vulnerable via internal use of NT conversion routines, but maintainers clarify they parse the wide command line themselves, mitigating worst‑fit.
curl and other cross‑platform tools: tension between “they’re victims of the platform” and “it’s still their bug on Windows.” Some say serious, common issues would be fixed regardless; others stress unpaid maintainers and platform complexity.

Process spawning & argument parsing

Windows fundamentally passes a single command‑line string; argv is a user‑space convention, and multiple runtimes (C, Go, Java, Python, etc.) parse it differently.
Because you can’t know how the callee parses arguments, commenters claim there is no universal, safe escaping scheme on Windows—only program‑specific ones.
Suggestions include:
- Use wide APIs end‑to‑end and convert to UTF‑8/WTF‑8 internally.
- Avoid Windows system()‑style command construction; prefer direct APIs or tightly specified argument parsing.
- For some high‑level languages, fail or warn on dangerous characters in subprocess args by default (controversial due to i18n needs).

Portability and encoding philosophy

Long back‑and‑forth on whether Windows should fully embrace UTF‑8 vs keeping UTF‑16/WTF‑16 as the “native” encoding:
- One camp: UTF‑8 has effectively “won”; Unix dominance on servers and portability concerns make UTF‑8 the only practical choice.
- Other camp: Windows internals and filesystems are 16‑bit‑unit based, can store invalid sequences, and require careful WTF‑16/WTF‑8 handling; blindly UTF‑8‑ifying *A APIs is fragile.
Several emphasize that many of these attacks are manifestations of already‑existing Unicode handling bugs in applications, only now exposed more clearly.

Microsoft’s compatibility stance

Commenters note Microsoft’s deep commitment to backward compatibility: e.g., trigraphs, ancient games, case‑insensitive filesystem behavior, legacy CRTs, and old codepages that still work.
Some argue security should justify breaking changes (e.g., disabling best‑fit, making UTF‑8 default), with shims or API versioning for old apps.
Others think staged opt‑ins via manifests, code‑analysis rules (e.g., discouraging best‑fit), and better documentation/linting are more realistic than a hard global switch.

Related topics