2024-05-20

Migrating Uber's ledger data from DynamoDB to LedgerStore

Scale and Data Model

Commenters unpack “1 trillion records” as ledger records, not user-visible trips; a single ride/order can generate many entries: fares, fees, tips, taxes, refunds, subscriptions, driver payouts, disputes, etc.
Trillions of index entries are mentioned; some infer heavy de-normalization and multi‑party accounting (rider, driver, restaurant, taxes).
Debate over whether Uber’s total trip count and per-transaction record count make the trillion figure plausible; consensus in the thread leans toward “yes, plausible.”

Cost Savings and ROI Debate

Headline $6M/year savings draws skepticism: some view it as small relative to Uber’s scale; others argue recurring savings are high-value and can justify large one‑time investments.
Several try to estimate headcount and compensation; rough math suggests a non-trivial portion of the savings could be consumed by development and ongoing maintenance.
Opportunity cost is raised: could those engineers have generated more value elsewhere vs. cost-saving infra work?

Build-vs-Buy and Cloud Dependence

Many note DynamoDB’s high cost, even when used “correctly” as a key-value store. Some see this migration as evidence that proprietary cloud databases get very expensive at scale.
Others emphasize advantages of offloading ops to AWS and question taking on custom DB on-call, firmware, and hardware concerns.
Lock‑in vs. migration cost is debated: some value moving off AWS primitives; others point out that any large-scale migration (even between VMs) is extremely expensive and risky.

Technical Architecture and Alternatives

Multiple suggestions: DynamoDB + Redshift or data warehouse tiering; parquet on S3; hot/cold architectures; MySQL/Postgres/Spanner-like systems; TigerBeetle, QLDB.
A long subthread rejects “just use SQLite on a huge box” due to file size limits, single-writer constraints, replication/backup complexity, and availability concerns.

Data Retention and Compliance

Questioning why so much historical payment data is kept online; replies cite regulatory retention (often ~10 years), financial/audit requirements, and fear of deletion bugs in money systems.
Soft-delete / active–inactive flags are described as common; actual deletion is rare.

Startup Spin-Off: HaystackDB

A founder of a write-optimized datastore joins the discussion, seeking customers.
Feedback: need enterprise sales, clearer positioning, technical whitepapers, benchmarks, and more convincing pricing (reads seen as too expensive).
Several urge focusing on a narrow, must-have niche and possibly open-source components to build trust.

Related topics