2024-12-06

Lies I was told about collab editing, Part 1: Algorithms for offline editing

Nature of conflicts: syntax vs semantics

Many comments stress the difference between “mathematically conflict‑free” and “semantically correct” results.
CRDTs/OT can ensure eventual consistency, but cannot ensure that the merged text says what authors intended.
Some argue true resolution requires shared human context, goals, and even “politics” between collaborators; algorithms can only approximate.

Limits of CRDTs and “conflict-free”

Overlapping edits (e.g., editing a word that another user deleted) are highlighted as a fundamental hard case.
Several note that CRDTs guarantee convergence/commutativity, not meaning; conflict‑free at the data level often yields clearly wrong text.
There is interest in CRDTs that explicitly preserve conflicts (multi-value registers, conflict annotations) rather than silently auto‑merging.

Offline editing as UX/semantics problem

Long‑lived offline branches (e.g., edits on a plane later auto‑merged) cause surprising and unwanted results.
Multiple commenters suggest offline collaboration should look more like Git/Word: show diffs, mark conflicts, and require explicit human acceptance.
For many domains (legal, journalism, scientific writing), review workflows and explicit sign‑off are seen as essential.

Git, semantic diff, and alternative algorithms

Git is praised for explicit conflict marking but criticized for poor, low‑level diffs that ignore AST/semantic structure.
Interest in semantic diff/merge for code, circuits, layouts, and rich documents; some report past attempts (e.g., AST merges) proved very complex.
Alternative approaches mentioned: event‑graph‑based CRDTs, custom text sync algorithms, differential sync, and server‑ordered “rebase/prediction” instead of pure CRDT/OT.

Bringing conflicts into the data model

Several propose representing conflicts structurally inside the data (e.g., conflict ranges, lattice‑based models, conflict types like XOR/aggregate).
This would allow collaborative resolution over time and possibly better tooling, while retaining CRDT convergence guarantees.

LLMs for merge assistance

Some advocate using LLMs to merge conflicting edits, arguing they can infer intent better than traditional algorithms.
Others counter that LLMs are unreliable, hard to analyze, and don’t satisfy CRDT requirements like determinism and associativity.
Emerging consensus: LLMs might be helpful as a layer on top of explicit conflict detection, but not as a replacement for deterministic merge algorithms or human review.

Infrastructure and practicality

CRDT storage is reported to be heavy on relational databases (especially Postgres); key‑value or LSM‑tree stores are suggested as better fits.
Several note that long‑running offline merges are a niche but demanding use case; short‑term offline plus good conflict UI may be the pragmatic target.

Related topics