Making the Tibetan language a first-class citizen in the digital world
LibreOffice and Long Tibetan Paragraphs
- Discussion centers on a key LibreOffice fix that made layout of extremely long paragraphs scalable for Tibetan text.
- A linked patch replaces an O(n²) script-run computation with an LRU cache and adjusts layout context sizing, turning a long‑standing performance bug into a small final change.
- Some note the bug existed for nearly a decade, suggesting Tibetan text was rarely tested; others counter that the visible “5‑line fix” caps a years‑long refactor that made such a solution possible only recently.
- A LibreOffice QA volunteer describes ongoing efforts to improve RTL/CTL/CJK support, invites bug reports and donations, and notes a meta‑bug tracking many minority scripts (Tibetan, Mongolian, Uyghur, etc.).
Tibetan Writing Conventions and Paragraph Length
- The thread highlights that Tibetan manuscripts often use extremely long, unbroken text streams—sometimes spanning tens of pages or more—without Western‑style paragraphs.
- This clashes with word processor assumptions like “paragraphs are short” and “text has spaces,” joining the list of “falsehoods programmers believe about text.”
- Comparisons are made to scriptio continua in ancient Greek/Latin and to legal documents that ban paragraph breaks, producing single paragraphs across hundreds of pages.
Language Preservation vs Evolution
- Some argue software should faithfully support existing Tibetan conventions, including massive paragraphs, both for current users and historical texts.
- Others suggest Tibetan orthography might reasonably evolve—adding spaces and paragraphs in the digital era, as other scripts did when media changed.
- A counterpoint stresses that making software more flexible enables, rather than blocks, innovation in all languages.
Politics and the Stakes for Tibetan
- Several comments tie digital Tibetan support to broader concerns about cultural erasure and Sinicization, arguing that language technology is part of preserving Tibetan identity.
- Others say geopolitical realities made “Free Tibet” activism fade in the West, as confronting a powerful China is seen as unrealistic.
- There is disagreement over whether such political discussion is appropriate in a technical thread, but defenders insist the linguistic work is inseparable from the political context.
Unicode, Scripts, and Related Languages
- Tibetan script has long been in Unicode; complaints about Unicode prioritizing emoji are challenged as unfounded in this context.
- Clarifications: many Tibetic languages share a single modern Tibetan script; Dzongkha also uses it, though feature completeness for specific languages may vary.
- Bengali/Assamese and Tibetan scripts share historical roots (via Gupta) but encode unrelated language families and are not mutually intelligible.
Historical and Community Efforts
- The thread recalls early work digitizing Tibetan, such as HyperCard pronunciation tools and a 1990s project to build Tibetan bibliographies, fonts, and library systems.
- An open‑source Tibetan dictionary project is mentioned as another piece of the digital ecosystem.