Making the Tibetan language a first-class citizen in the digital world

LibreOffice and Long Tibetan Paragraphs

  • Discussion centers on a key LibreOffice fix that made layout of extremely long paragraphs scalable for Tibetan text.
  • A linked patch replaces an O(n²) script-run computation with an LRU cache and adjusts layout context sizing, turning a long‑standing performance bug into a small final change.
  • Some note the bug existed for nearly a decade, suggesting Tibetan text was rarely tested; others counter that the visible “5‑line fix” caps a years‑long refactor that made such a solution possible only recently.
  • A LibreOffice QA volunteer describes ongoing efforts to improve RTL/CTL/CJK support, invites bug reports and donations, and notes a meta‑bug tracking many minority scripts (Tibetan, Mongolian, Uyghur, etc.).

Tibetan Writing Conventions and Paragraph Length

  • The thread highlights that Tibetan manuscripts often use extremely long, unbroken text streams—sometimes spanning tens of pages or more—without Western‑style paragraphs.
  • This clashes with word processor assumptions like “paragraphs are short” and “text has spaces,” joining the list of “falsehoods programmers believe about text.”
  • Comparisons are made to scriptio continua in ancient Greek/Latin and to legal documents that ban paragraph breaks, producing single paragraphs across hundreds of pages.

Language Preservation vs Evolution

  • Some argue software should faithfully support existing Tibetan conventions, including massive paragraphs, both for current users and historical texts.
  • Others suggest Tibetan orthography might reasonably evolve—adding spaces and paragraphs in the digital era, as other scripts did when media changed.
  • A counterpoint stresses that making software more flexible enables, rather than blocks, innovation in all languages.

Politics and the Stakes for Tibetan

  • Several comments tie digital Tibetan support to broader concerns about cultural erasure and Sinicization, arguing that language technology is part of preserving Tibetan identity.
  • Others say geopolitical realities made “Free Tibet” activism fade in the West, as confronting a powerful China is seen as unrealistic.
  • There is disagreement over whether such political discussion is appropriate in a technical thread, but defenders insist the linguistic work is inseparable from the political context.

Unicode, Scripts, and Related Languages

  • Tibetan script has long been in Unicode; complaints about Unicode prioritizing emoji are challenged as unfounded in this context.
  • Clarifications: many Tibetic languages share a single modern Tibetan script; Dzongkha also uses it, though feature completeness for specific languages may vary.
  • Bengali/Assamese and Tibetan scripts share historical roots (via Gupta) but encode unrelated language families and are not mutually intelligible.

Historical and Community Efforts

  • The thread recalls early work digitizing Tibetan, such as HyperCard pronunciation tools and a 1990s project to build Tibetan bibliographies, fonts, and library systems.
  • An open‑source Tibetan dictionary project is mentioned as another piece of the digital ecosystem.