Migrating Uber's ledger data from DynamoDB to LedgerStore

Scale and Data Model

  • Commenters unpack “1 trillion records” as ledger records, not user-visible trips; a single ride/order can generate many entries: fares, fees, tips, taxes, refunds, subscriptions, driver payouts, disputes, etc.
  • Trillions of index entries are mentioned; some infer heavy de-normalization and multi‑party accounting (rider, driver, restaurant, taxes).
  • Debate over whether Uber’s total trip count and per-transaction record count make the trillion figure plausible; consensus in the thread leans toward “yes, plausible.”

Cost Savings and ROI Debate

  • Headline $6M/year savings draws skepticism: some view it as small relative to Uber’s scale; others argue recurring savings are high-value and can justify large one‑time investments.
  • Several try to estimate headcount and compensation; rough math suggests a non-trivial portion of the savings could be consumed by development and ongoing maintenance.
  • Opportunity cost is raised: could those engineers have generated more value elsewhere vs. cost-saving infra work?

Build-vs-Buy and Cloud Dependence

  • Many note DynamoDB’s high cost, even when used “correctly” as a key-value store. Some see this migration as evidence that proprietary cloud databases get very expensive at scale.
  • Others emphasize advantages of offloading ops to AWS and question taking on custom DB on-call, firmware, and hardware concerns.
  • Lock‑in vs. migration cost is debated: some value moving off AWS primitives; others point out that any large-scale migration (even between VMs) is extremely expensive and risky.

Technical Architecture and Alternatives

  • Multiple suggestions: DynamoDB + Redshift or data warehouse tiering; parquet on S3; hot/cold architectures; MySQL/Postgres/Spanner-like systems; TigerBeetle, QLDB.
  • A long subthread rejects “just use SQLite on a huge box” due to file size limits, single-writer constraints, replication/backup complexity, and availability concerns.

Data Retention and Compliance

  • Questioning why so much historical payment data is kept online; replies cite regulatory retention (often ~10 years), financial/audit requirements, and fear of deletion bugs in money systems.
  • Soft-delete / active–inactive flags are described as common; actual deletion is rare.

Startup Spin-Off: HaystackDB

  • A founder of a write-optimized datastore joins the discussion, seeking customers.
  • Feedback: need enterprise sales, clearer positioning, technical whitepapers, benchmarks, and more convincing pricing (reads seen as too expensive).
  • Several urge focusing on a narrow, must-have niche and possibly open-source components to build trust.