Microsoft Office is using an artificially complex XML schema as a lock-in tool
Nature of OOXML Complexity
- Many commenters distinguish between parsing XML (trivial with a schema) and implementing the semantics (hard part).
- OOXML is described as effectively a serialized snapshot of Office’s internal state, encoding decades of features, quirks, and compatibility flags.
- Several argue the 8,000+ page spec reflects Office’s true complexity rather than something “artificially” inflated at the schema level.
Intentional Lock‑In vs Organic History
- One side: complexity is “organic” and incidental—driven by backwards compatibility, legacy printer quirks, old binary formats, and regulatory pressure to publish a spec.
- Other side: Microsoft had strong incentives to “embrace, extend, extinguish” open formats; complexity and underspecification function as de‑facto lock‑in even if no engineer sat down to sabotage it.
- Some note Microsoft could have adopted OpenDocument or created a cleaner abstraction but instead essentially dumped internal structures to XML (“malicious compliance” view).
Interoperability and LibreOffice
- Experience reports: LibreOffice sometimes loses comments or formatting and shows warnings users ignore; import/export fidelity is a major pain point.
- Free/open‑source projects struggle to implement more than a subset of OOXML due to cost and moving targets, which in practice reinforces Office dominance.
- Counterpoint: LibreOffice also excels at many legacy formats, sometimes outperforming Microsoft’s own tools.
Comparisons to Web Standards and Other Formats
- HTML/CSS are cited as similarly huge and detailed, but defenders say they’re complex yet well‑specified, open, and designed to be interoperable—unlike OOXML’s underspecified “behave like Word 95”‑style flags.
- Others note that browsers are also incredibly hard to implement; complexity alone is not proof of bad faith.
- Analogies are drawn to PSD, PDF, Bluetooth, banking XML APIs: many large ecosystems end up with monstrous, but not necessarily malicious, schemas.
WYSIWYG, Document Models, and “Export” Formats
- Several argue the real problem is the WYSIWYG, page‑faithful model and using “project files” (docx/xlsx) as interchange, instead of simpler export formats.
- Others reply that users demand precise layout and print‑faithful documents; markdown/LaTeX‑style workflows are unrealistic for most non‑technical users.
Tooling, Code Generation, and AI
- XML serializers/codegen make schema consumption easier, but do nothing to resolve semantic and rendering complexity.
- Commenters are skeptical that AI could implement a correct OOXML engine without detailed, machine‑readable semantics.
Standards, Antitrust, and Alternatives
- OOXML’s publication is linked by some to EU/US antitrust pressure; it’s “open” on paper (ECMA/ISO) yet still very hard to fully implement.
- Some suggest OpenDocument remains far cleaner and has long been recommended by governments, but market power, contracts, and user habits keep Office entrenched.