2024-10-13

Zero-latency SQLite storage in every Durable Object

Use cases and target workloads

Strong interest in DOs for realtime, multiplayer-style apps (docs/design tools, games, collaborative UIs). Single logical “document/room” maps cleanly to one DO + SQLite.
Suggested also for low-traffic internal tools or per-tenant isolation, where a full managed Postgres instance feels heavy or too costly.
Skeptics argue most production systems should stick with “boring” Postgres/VMs until needs are extreme, due to maturity and edge-case risk.

Durability, latency, and consistency

Each DO has its own local SQLite DB. All operations on that DB are routed to the single DO instance worldwide, so it always has a consistent view.
Writes are committed locally, then synchronously replicated to 5 nearby replicas; the write is acknowledged after 3 confirm.
WAL chunks are also streamed to object storage every ~16MB or 10 seconds for backup/rollback. Some posters worry this implies up to ~10s of potential data loss on crash; others clarify that object storage is for backups, not primary reads.
Within a DO, reads see writes immediately. There is no cross-region “read after write” issue because you cannot bypass the DO to read the DB directly.

Location, scaling, and hot partitions

DOs are created in a region (optionally hinted) and currently do not automatically relocate, though future dynamic relocation is planned.
Only a subset of Cloudflare PoPs host DOs; other PoPs forward traffic.
Single-partition hotspots are called out as a concern; counterpoint is that SQLite can handle very high write rates for many workloads, and reads can be offloaded via caching.

Programming model, features, and limits

DOs are described as an actor-like model with global routing and optional RPC-style calls between objects or workers.
Sync-looking write API actually defers error reporting until response, enabling automatic batching.
Noted limits: 128MB RAM per runtime, no built-in read transactions/snapshots, tricky long-lived cursors due to WAL growth, and hibernating WebSockets.

Cost, lock-in, and operational concerns

Pricing and hibernation behavior make some developers nervous; lack of strong spending caps is seen as risky for small teams.
Heavy vendor lock-in worries many; rebuilding elsewhere would be nontrivial, though some projects try to offer portable abstractions.
Debuggability, observability, and handling slow DOs or failures at scale are flagged as open concerns.

Data modeling, analytics, and migrations

Per-document/tenant SQLite is attractive for localized state, but makes global queries (e.g., “all full flights”, analytics across all docs) harder; likely requires a separate analytics system.
Schema migrations across many DOs are nontrivial; suggested pattern is running per-DO migrations on initialization.
One poster dislikes “many tiny DBs” from a relational perspective; others note it fits document-like domains better than giant global tables.

Comparisons to traditional databases

Several argue: start with Postgres for most apps; only move to specialized systems (ClickHouse, DOs, etc.) once scale or latency demands justify complexity.
Others view colocating compute + SQLite as a real complexity reduction for certain classes of apps, not just “shiny tech.”

Unclear / open questions

How data residency and regulatory requirements are satisfied is raised but not answered.
Low-level implementation details (e.g., exact VFS/WAL integration) remain unexplained in the thread.

Related topics