A years-long Turkish alphabet bug in the Kotlin compiler

Turkish locale case-folding bug & similar experiences

  • Multiple developers recall hitting Turkish toLowerCase/toUpperCase bugs in Java/Kotlin, especially when mapping enum names or log levels by lowercasing ASCII strings.
  • Static analysis tools do warn about locale-dependent operations, but people often dismiss them assuming ASCII is “safe.”
  • Some report using Turkish system locales (or test JVMs in Turkish) specifically to flush out these bugs.

Design of APIs and locales

  • Many argue that any language/library which exposes case-conversion or formatting APIs without a mandatory locale parameter is misdesigned.
  • Suggested patterns:
    • Use an invariant locale for internal constants (Locale.ROOT in Java, invariant culture in C#, ASCII-only case transforms).
    • Reserve user locale only for true user-facing text and numbers.
  • Others push back that defaulting everything to invariant/ROOT would break valid user input (e.g., number formats with commas) and that many developers would still pick the wrong locale anyway.
  • C/POSIX locale APIs are criticized as global-state, thread-unsafe, ASCII-centric, and hard to reason about; yet they historically made local software “just work” with user locales.

Unicode, Turkish alphabet, and blame

  • Long subthread debates whether Turkish’s dotted/dotless “I” is a “bug” in Turkish orthography or a bug in software assumptions and Unicode design.
  • Explanations from Turkish speakers:
    • The alphabet was redesigned to be phonetic with vowel harmony; ı/i, o/ö, u/ü pairs mirror each other; capital İ and lowercase ı are logical within that system.
    • The reform predates computers; nobody anticipated global, language-agnostic case algorithms.
  • Others argue every Latin-script language except Turkish (and descendants) treats I/i as a pair, so breaking that convention predictably causes issues, whose impact mostly falls on Turkish users.
  • Unicode’s reuse of ASCII I for Turkish dotless capital, rather than a separate code point, is called both a “feature” (for round‑trip encoding compatibility) and a spec-violating “bug” that forces locale-aware casing forever.

Developer ergonomics & workarounds

  • Turkish users describe having to switch entire systems to English to avoid crashes in Java/Python/PHP apps compiled under Turkish locales; this conflicts with preferences for non-US dates, units, and paper sizes.
  • People share tricks like using en_DK, en_IE, or “English (Europe)” to get English UI with sane metrics and ISO dates.

Enums, XML, and string operations

  • The specific Kotlin issue involved reading compiler messages from XML and mapping severity tags via lowercasing, which fails in Turkish.
  • Several commenters see “enums as magic strings + case-folding” as inherently fragile; better to use case-sensitive keys or generated resource APIs.
  • General sentiment: any nontrivial string operation (casing, collation) is surprisingly subtle; “all operations on strings are wrong” without explicit language/locale metadata.