WorstFit: Unveiling Hidden Transformers in Windows ANSI

Overall reaction & nature of the issue

  • Many see the vulnerability as unsurprising given Windows’ legacy layers, but still eye‑opening in how multiple “harmless” features combine into serious exploits.
  • Core problem: Windows “ANSI” APIs use a “best‑fit” Unicode→codepage mapping that silently turns certain Unicode characters into ASCII metacharacters (", \, /, -, etc.) after an application has validated input.
  • This breaks security assumptions in argument handling, shell escaping, path validation, etc., especially when wide‑string logic and ANSI APIs are mixed.

ANSI vs Unicode on Windows

  • Strong consensus: new code should avoid *A (ANSI) Win32 APIs and use *W (wide) variants plus explicit conversion.
  • Several note that Microsoft has recommended wide APIs since early NT, but its own C runtime historically routes fopen, getenv, argv, etc. through *A, perpetuating best‑fit issues.
  • Some argue for simply killing best‑fit or mapping unrepresentable chars to a harmless placeholder and/or failing early.

UTF‑8 codepage and manifests

  • Windows now allows opting into UTF‑8 as the “ANSI” codepage via manifests or a system‑wide “Beta: UTF‑8” checkbox.
  • Experiences differ: some report years of smooth use; others saw random app crashes, especially with legacy software assuming fixed 1‑byte‑per‑char encodings or limited buffer growth.
  • Debate whether this is a good general solution:
    • Pro: aligns Windows with Unix/UTF‑8, simplifies portable C/C++ and CLI tools.
    • Con: doesn’t handle invalid UTF‑16 from Win32 (WTF‑16) cleanly, can break unknown DLLs using *A, and still risks information loss.

Impact on languages, runtimes, and tools

  • Rust’s standard library mostly uses wide APIs (GetCommandLineW, etc.) and bypasses argv, so the described attacks don’t directly hit Rust binaries; child processes that use ANSI APIs remain at risk.
  • Cygwin was initially suspected vulnerable via internal use of NT conversion routines, but maintainers clarify they parse the wide command line themselves, mitigating worst‑fit.
  • curl and other cross‑platform tools: tension between “they’re victims of the platform” and “it’s still their bug on Windows.” Some say serious, common issues would be fixed regardless; others stress unpaid maintainers and platform complexity.

Process spawning & argument parsing

  • Windows fundamentally passes a single command‑line string; argv is a user‑space convention, and multiple runtimes (C, Go, Java, Python, etc.) parse it differently.
  • Because you can’t know how the callee parses arguments, commenters claim there is no universal, safe escaping scheme on Windows—only program‑specific ones.
  • Suggestions include:
    • Use wide APIs end‑to‑end and convert to UTF‑8/WTF‑8 internally.
    • Avoid Windows system()‑style command construction; prefer direct APIs or tightly specified argument parsing.
    • For some high‑level languages, fail or warn on dangerous characters in subprocess args by default (controversial due to i18n needs).

Portability and encoding philosophy

  • Long back‑and‑forth on whether Windows should fully embrace UTF‑8 vs keeping UTF‑16/WTF‑16 as the “native” encoding:
    • One camp: UTF‑8 has effectively “won”; Unix dominance on servers and portability concerns make UTF‑8 the only practical choice.
    • Other camp: Windows internals and filesystems are 16‑bit‑unit based, can store invalid sequences, and require careful WTF‑16/WTF‑8 handling; blindly UTF‑8‑ifying *A APIs is fragile.
  • Several emphasize that many of these attacks are manifestations of already‑existing Unicode handling bugs in applications, only now exposed more clearly.

Microsoft’s compatibility stance

  • Commenters note Microsoft’s deep commitment to backward compatibility: e.g., trigraphs, ancient games, case‑insensitive filesystem behavior, legacy CRTs, and old codepages that still work.
  • Some argue security should justify breaking changes (e.g., disabling best‑fit, making UTF‑8 default), with shims or API versioning for old apps.
  • Others think staged opt‑ins via manifests, code‑analysis rules (e.g., discouraging best‑fit), and better documentation/linting are more realistic than a hard global switch.