Faster CRDTs (2021)
Article & benchmark context
- Thread revisits a 2021 optimization deep-dive on text CRDTs; many readers still find it unusually entertaining and clear despite technical depth.
- Original working title “CRDTs go brrr” is remembered fondly; some want the date visible because it affects how to read perf numbers.
- Several prior HN discussions of the same piece are linked; people now want updated benchmarks, especially against Rust implementations of Automerge and Yjs (Yrs).
- The author notes both major libraries and newer algorithms have since become significantly faster.
Why CRDTs were “slow” and performance details
- One explanation: early libraries were largely written by academics, with correctness prioritized over optimization.
- Since then, CRDT implementations have improved by orders of magnitude; some newer projects claim another large speedup since the article.
- A side discussion explores why a Rust-to-WASM build ran ~4x slower than native: FFI overhead was controlled for; the slowdown seemed to be raw codegen/runtime. Participants speculate about loss of high-level intent, weaker instruction sets, and optimization limits in current WASM JITs.
- Microbenchmark details like ideal bucket size (32 entries) are tied to cache-line and memory hierarchy effects.
Real-world use cases and UX quality
- Many apps are cited as CRDT-based or CRDT-like: design tools, reference managers, local-first workspaces, notes/todo apps, collaborative text/graph tools, and iCloud Notes.
- Experiences vary: some tools feel “Google Docs–smooth,” others (e.g., certain Notion workflows, Apple Notes in specific cases) show cursor glitches or lag under concurrent editing.
- A recurring theme: CRDTs shine in local-first, creative workflows (writing, design, coding) where offline edits are valuable; they’re seen as less compelling for fast-authoritative domains like real-time games.
Git, blockchains, and definition debates
- A large subthread debates whether Git or blockchains are “pragmatic CRDTs.”
- One side: they behave like widely used, conflict-resolving replicated data structures; for practical discussions, grouping them with CRDTs helps intuition.
- The other side: by the formal definition, they fail key properties:
- Git: manual merge conflicts, non-idempotent/ordering-dependent merges, and different peers can converge to different histories.
- Blockchains: cannot be updated independently without coordination; convergence relies on consensus, not purely on CRDT-style merge laws.
- Several argue stretching the term “CRDT” to cover these systems confuses newcomers; better to reserve it for structures whose merge is associative, commutative, and idempotent.
History size, privacy, and truncation
- Concern: CRDTs leaving long operation logs.
- Replies: for text, overhead can be extremely small; with compression, full histories may be smaller than final document state.
- Some libraries discard deleted text content while keeping structural metadata, which helps with privacy.
- Truncating history in decentralized settings is tricky: without careful snapshotting/consensus, peers that only saw partial histories can become irreconcilable.
- For immutable, content-addressed designs, redaction (e.g., GDPR) may require repacking history and breaking signature chains; partitioning data into smaller, per-resource histories is suggested.