2025-10-12

A years-long Turkish alphabet bug in the Kotlin compiler

Turkish locale case-folding bug & similar experiences

Multiple developers recall hitting Turkish toLowerCase/toUpperCase bugs in Java/Kotlin, especially when mapping enum names or log levels by lowercasing ASCII strings.
Static analysis tools do warn about locale-dependent operations, but people often dismiss them assuming ASCII is “safe.”
Some report using Turkish system locales (or test JVMs in Turkish) specifically to flush out these bugs.

Design of APIs and locales

Many argue that any language/library which exposes case-conversion or formatting APIs without a mandatory locale parameter is misdesigned.
Suggested patterns:
- Use an invariant locale for internal constants (Locale.ROOT in Java, invariant culture in C#, ASCII-only case transforms).
- Reserve user locale only for true user-facing text and numbers.
Others push back that defaulting everything to invariant/ROOT would break valid user input (e.g., number formats with commas) and that many developers would still pick the wrong locale anyway.
C/POSIX locale APIs are criticized as global-state, thread-unsafe, ASCII-centric, and hard to reason about; yet they historically made local software “just work” with user locales.

Unicode, Turkish alphabet, and blame

Long subthread debates whether Turkish’s dotted/dotless “I” is a “bug” in Turkish orthography or a bug in software assumptions and Unicode design.
Explanations from Turkish speakers:
- The alphabet was redesigned to be phonetic with vowel harmony; ı/i, o/ö, u/ü pairs mirror each other; capital İ and lowercase ı are logical within that system.
- The reform predates computers; nobody anticipated global, language-agnostic case algorithms.
Others argue every Latin-script language except Turkish (and descendants) treats I/i as a pair, so breaking that convention predictably causes issues, whose impact mostly falls on Turkish users.
Unicode’s reuse of ASCII I for Turkish dotless capital, rather than a separate code point, is called both a “feature” (for round‑trip encoding compatibility) and a spec-violating “bug” that forces locale-aware casing forever.

Developer ergonomics & workarounds

Turkish users describe having to switch entire systems to English to avoid crashes in Java/Python/PHP apps compiled under Turkish locales; this conflicts with preferences for non-US dates, units, and paper sizes.
People share tricks like using en_DK, en_IE, or “English (Europe)” to get English UI with sane metrics and ISO dates.

Enums, XML, and string operations

The specific Kotlin issue involved reading compiler messages from XML and mapping severity tags via lowercasing, which fails in Turkish.
Several commenters see “enums as magic strings + case-folding” as inherently fragile; better to use case-sensitive keys or generated resource APIs.
General sentiment: any nontrivial string operation (casing, collation) is surprisingly subtle; “all operations on strings are wrong” without explicit language/locale metadata.

Related topics