2024-08-27

Faster CRDTs (2021)

Article & benchmark context

Thread revisits a 2021 optimization deep-dive on text CRDTs; many readers still find it unusually entertaining and clear despite technical depth.
Original working title “CRDTs go brrr” is remembered fondly; some want the date visible because it affects how to read perf numbers.
Several prior HN discussions of the same piece are linked; people now want updated benchmarks, especially against Rust implementations of Automerge and Yjs (Yrs).
The author notes both major libraries and newer algorithms have since become significantly faster.

Why CRDTs were “slow” and performance details

One explanation: early libraries were largely written by academics, with correctness prioritized over optimization.
Since then, CRDT implementations have improved by orders of magnitude; some newer projects claim another large speedup since the article.
A side discussion explores why a Rust-to-WASM build ran ~4x slower than native: FFI overhead was controlled for; the slowdown seemed to be raw codegen/runtime. Participants speculate about loss of high-level intent, weaker instruction sets, and optimization limits in current WASM JITs.
Microbenchmark details like ideal bucket size (32 entries) are tied to cache-line and memory hierarchy effects.

Real-world use cases and UX quality

Many apps are cited as CRDT-based or CRDT-like: design tools, reference managers, local-first workspaces, notes/todo apps, collaborative text/graph tools, and iCloud Notes.
Experiences vary: some tools feel “Google Docs–smooth,” others (e.g., certain Notion workflows, Apple Notes in specific cases) show cursor glitches or lag under concurrent editing.
A recurring theme: CRDTs shine in local-first, creative workflows (writing, design, coding) where offline edits are valuable; they’re seen as less compelling for fast-authoritative domains like real-time games.

Git, blockchains, and definition debates

A large subthread debates whether Git or blockchains are “pragmatic CRDTs.”
One side: they behave like widely used, conflict-resolving replicated data structures; for practical discussions, grouping them with CRDTs helps intuition.
The other side: by the formal definition, they fail key properties:
- Git: manual merge conflicts, non-idempotent/ordering-dependent merges, and different peers can converge to different histories.
- Blockchains: cannot be updated independently without coordination; convergence relies on consensus, not purely on CRDT-style merge laws.
Several argue stretching the term “CRDT” to cover these systems confuses newcomers; better to reserve it for structures whose merge is associative, commutative, and idempotent.

History size, privacy, and truncation

Concern: CRDTs leaving long operation logs.
Replies: for text, overhead can be extremely small; with compression, full histories may be smaller than final document state.
Some libraries discard deleted text content while keeping structural metadata, which helps with privacy.
Truncating history in decentralized settings is tricky: without careful snapshotting/consensus, peers that only saw partial histories can become irreconcilable.
For immutable, content-addressed designs, redaction (e.g., GDPR) may require repacking history and breaking signature chains; partitioning data into smaller, per-resource histories is suggested.

Related topics