Upgrading Uber's MySQL Fleet

Version choice: MySQL 8.0 vs 8.4

  • Several commenters ask why Uber upgraded to 8.0 (old LTS) instead of 8.4 (current LTS).
  • Answers from the thread:
    • Project started in 2023, before 8.4 existed (released April 2024).
    • Direct upgrade 5.7 → 8.4 is not supported; 8.0 is a required step.
    • 8.0 is seen as more “battle-tested”; 8.4 likely has more unknowns.
  • Some expect that 8.0 → 8.4 later will be relatively cheap now that tooling and processes are in place.

Stability, bugs, and MySQL vs MariaDB

  • Strong disagreement over MySQL’s stability:
    • Some report years of trouble‑free use at banks and large companies.
    • Others describe serious bugs, especially in newer or less‑used features (e.g., table rename issues, InnoDB full‑text, MVIs, regex, JSON).
    • MySQL 8.0 is criticized for frequent behavioral changes in patch releases.
  • MariaDB is suggested as an alternative, but:
    • Commenters note it is no longer a true drop‑in replacement; significant DDL and replication differences.
    • Some major projects now explicitly do not support MariaDB.

Postgres, VACUUM, and MVCC tradeoffs

  • Big subthread comparing MySQL vs Postgres:
    • MySQL praised as “smooth” and low‑maintenance for some high‑churn use cases.
    • Multiple complaints about Postgres VACUUM being a resource hog and operationally painful at scale, especially with:
      • Very large, heavily updated tables.
      • Extremely high table counts (hundreds of thousands per DB).
    • Others counter that:
      • Modern autovacuum is much improved; tuning per‑table and sharding help.
      • Routine VACUUM (not VACUUM FULL) is usually fine; problems come from misconfig or edge workloads.
  • Explanations given of MySQL’s in‑place updates vs Postgres’ heap + dead tuple cleanup.
  • Consensus: Postgres can run very well, but demands more DBA expertise and ongoing maintenance than MySQL.

Scale, architecture, and capacity utilization

  • Uber’s numbers (≈3M QPS, 2.1K clusters, 16K nodes) prompt debate:
    • Simple averaging gives ~200 QPS per node or ~1.4K QPS per cluster, which some see as low and potentially overprovisioned.
    • Others argue this division is meaningless:
      • Load is highly uneven across clusters, regions, and times of day.
      • Mix of primaries/replicas and differing workloads makes “QPS per node” a poor metric.
  • Some speculate that their architecture may be expensive relative to alternatives (e.g., DynamoDB, KV stores), but acknowledge lack of visibility into schema and query patterns.

Password rotation and MySQL authentication

  • Commenters like MySQL 8’s dual-password feature for smooth credential rotation vs painful “big bang” changes.
  • Thread notes:
    • mysql_native_password is deprecated in 8.0 and disabled by default in 8.4, but can be re‑enabled.
    • Future MySQL 9.0 will require disabling it entirely.
    • Migrating off old drivers and auth methods may surprise lagging applications.

Kubernetes and running databases

  • Question raised whether containerizing the DB layer (e.g., on Kubernetes) would have simplified Uber’s upgrade.
  • Multiple replies argue “no”:
    • Most upgrade complexity is in app logic, query behavior, regressions, and config changes — unrelated to container orchestration.
    • Running large stateful DBs on k8s is described as messy; k8s itself struggles at very large node counts, and some operators (e.g., CNI) don’t scale well past a few thousand nodes.

Cloud vs self‑managed and migration stories

  • Some describe similar 5.7 → 8.0 upgrades on managed services (e.g., Aurora), often using cross‑version replication and blue‑green cutovers with good results.
  • Others note:
    • Managed MySQL still requires version upgrades; you just outsource some mechanics.
    • For truly “never upgrade MySQL yourself,” you’d need a different class of service (e.g., fully managed horizontally scalable DBs), not just hosted MySQL.

LLM‑like writing style debate

  • Large subthread on whether the Uber blog post was written or “sanitized” by an LLM:
    • Indicators cited: over‑formal tone, heavy adjectives, words like “delve,” “compelling,” “seamless,” “embark,” repeated structure, and list‑like sections.
    • Others argue this is just typical corporate/marketing or non‑native (e.g., Indian) English, not necessarily AI.
    • Some worry that people are becoming overconfident in spotting AI, leading to false accusations and pressure to oversimplify writing.

Driver safety and business priorities (off‑topic tangent)

  • A side discussion criticizes Uber for focusing on infra upgrades while users report dangerous drivers.
  • Counterpoints:
    • Company behavior is framed as optimizing shareholder value, with legal tools like arbitration reducing liability.
    • Others argue safety is (or should be) a core part of long‑term shareholder value regardless.