Faster CRDTs (2021)

Article & benchmark context

  • Thread revisits a 2021 optimization deep-dive on text CRDTs; many readers still find it unusually entertaining and clear despite technical depth.
  • Original working title “CRDTs go brrr” is remembered fondly; some want the date visible because it affects how to read perf numbers.
  • Several prior HN discussions of the same piece are linked; people now want updated benchmarks, especially against Rust implementations of Automerge and Yjs (Yrs).
  • The author notes both major libraries and newer algorithms have since become significantly faster.

Why CRDTs were “slow” and performance details

  • One explanation: early libraries were largely written by academics, with correctness prioritized over optimization.
  • Since then, CRDT implementations have improved by orders of magnitude; some newer projects claim another large speedup since the article.
  • A side discussion explores why a Rust-to-WASM build ran ~4x slower than native: FFI overhead was controlled for; the slowdown seemed to be raw codegen/runtime. Participants speculate about loss of high-level intent, weaker instruction sets, and optimization limits in current WASM JITs.
  • Microbenchmark details like ideal bucket size (32 entries) are tied to cache-line and memory hierarchy effects.

Real-world use cases and UX quality

  • Many apps are cited as CRDT-based or CRDT-like: design tools, reference managers, local-first workspaces, notes/todo apps, collaborative text/graph tools, and iCloud Notes.
  • Experiences vary: some tools feel “Google Docs–smooth,” others (e.g., certain Notion workflows, Apple Notes in specific cases) show cursor glitches or lag under concurrent editing.
  • A recurring theme: CRDTs shine in local-first, creative workflows (writing, design, coding) where offline edits are valuable; they’re seen as less compelling for fast-authoritative domains like real-time games.

Git, blockchains, and definition debates

  • A large subthread debates whether Git or blockchains are “pragmatic CRDTs.”
  • One side: they behave like widely used, conflict-resolving replicated data structures; for practical discussions, grouping them with CRDTs helps intuition.
  • The other side: by the formal definition, they fail key properties:
    • Git: manual merge conflicts, non-idempotent/ordering-dependent merges, and different peers can converge to different histories.
    • Blockchains: cannot be updated independently without coordination; convergence relies on consensus, not purely on CRDT-style merge laws.
  • Several argue stretching the term “CRDT” to cover these systems confuses newcomers; better to reserve it for structures whose merge is associative, commutative, and idempotent.

History size, privacy, and truncation

  • Concern: CRDTs leaving long operation logs.
  • Replies: for text, overhead can be extremely small; with compression, full histories may be smaller than final document state.
  • Some libraries discard deleted text content while keeping structural metadata, which helps with privacy.
  • Truncating history in decentralized settings is tricky: without careful snapshotting/consensus, peers that only saw partial histories can become irreconcilable.
  • For immutable, content-addressed designs, redaction (e.g., GDPR) may require repacking history and breaking signature chains; partitioning data into smaller, per-resource histories is suggested.