How Uber tests payments in production
Overall reaction to the article
- Many commenters find the piece “fluffy,” overlong, and light on concrete technical detail.
- Others note that the core message is very simple: staging can’t catch all bugs; you must be prepared to detect and fix issues in production.
- A few see value in the anecdotes and framing, especially for less-experienced engineers, but some replies treat that praise as satire.
Staging vs. production for payments
- Broad agreement that some failures only surface in real production conditions: real banks, networks, device flows, fraud systems, edge cases.
- Several engineers say they always test in staging first, then do at least one live payment in prod after each deployment.
- Others argue good sandboxes (e.g. some modern providers) can be sufficient for most cases, with only a minimal “sanity check” in prod.
Quality of payment provider test environments
- Many report test environments are unreliable or non-representative: different validation rules, broken features, stale data, or missing “special” account settings.
- This has led some teams to largely abandon test endpoints and rely on real endpoints plus corporate cards or virtual test cards.
- A minority report the opposite experience: for them, provider sandboxes (especially some well-known ones) closely match production and work well.
Using real cards and compliance issues
- Common practice described: live “smoke tests” using corporate cards or prepaid/virtual cards, sometimes fully exercising refund/cancel flows.
- Concerns raised:
- Card network terms and payment provider agreements may forbid self-pay or repeated test transactions in live mode.
- PCI/PA-DSS explicitly say real card data must not be used in non-production environments.
- Some claim specific providers will terminate accounts for self-pay tests; others say they’ve done it for years without issue.
- Legal/HR angle: pressuring employees to use personal cards is criticized; in some jurisdictions it may be illegal if not reimbursed and optional.
Rollout and “testing in production” strategies
- Discussion of canary releases, region-by-region rollouts, and parallel “shadow” systems that replay real traffic and compare outputs.
- Some note that a slow rollout isn’t always viable (e.g., urgent security fixes), but generally staged rollouts are considered best practice.