Debian opens a can of username worms
Scope of Debian Username Changes
- Debian (via shadow-utils and adduser) is loosening username rules, potentially allowing UTF‑8, numerics, and more punctuation.
- Many see this as risky: it diverges from long‑standing conventions and could break tooling that assumes conservative, ASCII‑only usernames.
- Others argue the previous Debian‑specific patch was itself a mistake, and aligning with upstream / modern Unicode reality is overdue.
Unicode in Identifiers
- Several comments note Unicode already defines identifier rules (TR31, RFC 8264/8265) and security guidelines (confusables, spoofing).
- Libraries like ICU, libunistring, libidn, libu8ident exist, but adoption is patchy; many tools (e.g., grep variants) still handle Unicode poorly.
- Advocates say: use these standards, apply normalization (e.g., NFKC), and restrict to safe categories (letters, digits) rather than “all of Unicode.”
- Critics emphasize normalization, bidirectional text, and homoglyphs as a deep well of complexity and subtle bugs.
Internationalization vs ASCII-only Usernames
- Pro‑Unicode side: legacy codepages were worse; many languages (CJK, Cyrillic, accents) were effectively excluded; it’s unfair and user‑hostile to keep ASCII only.
- Anti‑Unicode or cautious side: usernames are low‑level identifiers; ASCII is a useful common denominator, especially when logging in from random keyboards or debugging over SSH.
- Some propose: ASCII‑only for login names, but UTF‑8 for full names / display fields; others insist people should be able to log in with their real‑script names.
POSIX, Standards, and Practicality
- POSIX “portable username” set is
[A‑Za‑z0‑9._-](hyphen not first). Numeric usernames are allowed there. - Some call this outdated and want UTF‑8 everywhere; others say POSIX’s role is to describe existing practice, and a UTF‑8 transition would be a massive, decades‑long, compatibility project.
- There is disagreement whether standards bodies should “lead” (mandate UTF‑8) or “follow” (codify what major OSes already do).
Security, Shells, and Bug Compatibility
- Allowing shell metacharacters, spaces, and exotic Unicode in usernames is seen as a security foot‑gun: shell injection, misparsed scripts, ambiguous logs.
- Real vulnerabilities are reported where unsanitized usernames passed into scripts allowed
;,&,>etc. to execute arbitrary commands. - Some argue broken scripts are already wrong and should break so they get fixed; others stress that enterprises care about systems working today, not theoretical correctness.
- Comparison is made to filenames with spaces: Unix tools historically broke, but Windows forced adaptation by using spaces in system paths.
Numeric Usernames and Identifier Design
- Purely numeric usernames are criticized for colliding conceptually with numeric UIDs; tools often interpret “all digits” as UID, else as name.
- This can create confusing or insecure behavior if a numeric name doesn’t match its UID or matches someone else’s UID.
- Others note POSIX allows it; they propose local policy (e.g., disallow names equal to existing UIDs) or better ID schemes (prefixes, checksums, redundancy).
User Experience Anecdotes
- Many recount systems failing on:
- Diacritics in names (é, å, ç), apostrophes, or non‑Latin scripts.
- Non‑ASCII passwords that can be set but not used to log in.
- Windows and other systems mishandling Unicode in usernames or profile directories.
- As a result, even users with non‑ASCII names often deliberately stick to ASCII for usernames and sometimes passwords.
Alternative Ideas and Side Discussions
- Suggestions include:
- Punycode‑like encodings for usernames (machine‑safe, user‑friendly display).
- Treating usernames as opaque byte strings, punting encoding to higher layers.
- Keeping login identifiers simple and using richer UTF‑8 identifiers only where genuinely needed.
- A tangential discussion explores graphical / visual programming vs text, concluding that visual systems often become unmanageable “spaghetti,” and text remains the most practical representation.