Debian opens a can of username worms

Scope of Debian Username Changes

  • Debian (via shadow-utils and adduser) is loosening username rules, potentially allowing UTF‑8, numerics, and more punctuation.
  • Many see this as risky: it diverges from long‑standing conventions and could break tooling that assumes conservative, ASCII‑only usernames.
  • Others argue the previous Debian‑specific patch was itself a mistake, and aligning with upstream / modern Unicode reality is overdue.

Unicode in Identifiers

  • Several comments note Unicode already defines identifier rules (TR31, RFC 8264/8265) and security guidelines (confusables, spoofing).
  • Libraries like ICU, libunistring, libidn, libu8ident exist, but adoption is patchy; many tools (e.g., grep variants) still handle Unicode poorly.
  • Advocates say: use these standards, apply normalization (e.g., NFKC), and restrict to safe categories (letters, digits) rather than “all of Unicode.”
  • Critics emphasize normalization, bidirectional text, and homoglyphs as a deep well of complexity and subtle bugs.

Internationalization vs ASCII-only Usernames

  • Pro‑Unicode side: legacy codepages were worse; many languages (CJK, Cyrillic, accents) were effectively excluded; it’s unfair and user‑hostile to keep ASCII only.
  • Anti‑Unicode or cautious side: usernames are low‑level identifiers; ASCII is a useful common denominator, especially when logging in from random keyboards or debugging over SSH.
  • Some propose: ASCII‑only for login names, but UTF‑8 for full names / display fields; others insist people should be able to log in with their real‑script names.

POSIX, Standards, and Practicality

  • POSIX “portable username” set is [A‑Za‑z0‑9._-] (hyphen not first). Numeric usernames are allowed there.
  • Some call this outdated and want UTF‑8 everywhere; others say POSIX’s role is to describe existing practice, and a UTF‑8 transition would be a massive, decades‑long, compatibility project.
  • There is disagreement whether standards bodies should “lead” (mandate UTF‑8) or “follow” (codify what major OSes already do).

Security, Shells, and Bug Compatibility

  • Allowing shell metacharacters, spaces, and exotic Unicode in usernames is seen as a security foot‑gun: shell injection, misparsed scripts, ambiguous logs.
  • Real vulnerabilities are reported where unsanitized usernames passed into scripts allowed ;, &, > etc. to execute arbitrary commands.
  • Some argue broken scripts are already wrong and should break so they get fixed; others stress that enterprises care about systems working today, not theoretical correctness.
  • Comparison is made to filenames with spaces: Unix tools historically broke, but Windows forced adaptation by using spaces in system paths.

Numeric Usernames and Identifier Design

  • Purely numeric usernames are criticized for colliding conceptually with numeric UIDs; tools often interpret “all digits” as UID, else as name.
  • This can create confusing or insecure behavior if a numeric name doesn’t match its UID or matches someone else’s UID.
  • Others note POSIX allows it; they propose local policy (e.g., disallow names equal to existing UIDs) or better ID schemes (prefixes, checksums, redundancy).

User Experience Anecdotes

  • Many recount systems failing on:
    • Diacritics in names (é, å, ç), apostrophes, or non‑Latin scripts.
    • Non‑ASCII passwords that can be set but not used to log in.
    • Windows and other systems mishandling Unicode in usernames or profile directories.
  • As a result, even users with non‑ASCII names often deliberately stick to ASCII for usernames and sometimes passwords.

Alternative Ideas and Side Discussions

  • Suggestions include:
    • Punycode‑like encodings for usernames (machine‑safe, user‑friendly display).
    • Treating usernames as opaque byte strings, punting encoding to higher layers.
    • Keeping login identifiers simple and using richer UTF‑8 identifiers only where genuinely needed.
  • A tangential discussion explores graphical / visual programming vs text, concluding that visual systems often become unmanageable “spaghetti,” and text remains the most practical representation.