Breaking Up with On-Call
Image choice and symbolism
- Several note the article’s “guard tower” photo appears to be from Manzanar, a WWII Japanese-American internment camp, calling it a poor and insensitive metaphor for on-call.
- Some argue intent was likely innocent (“grabbed from Google Images”) and changing it is a low‑stakes courtesy.
- Others push back, seeing complaints as moral posturing and questioning who is actually harmed; debate touches on triggers vs exposure, and whether avoiding such images helps.
- Multiple people add that on‑call is more like firefighters/EMTs than prison guards, so the metaphor is wrong even aside from history.
Incentives, culture, and responsibility
- Strong theme: on‑call pain is often inversely related to incentive alignment. When engineers (or the org) feel real consequences, they reduce incidents and treat alerts as tech debt.
- Many complain that management prioritizes features over reliability; ops and SREs lack authority to fix root causes; “hero culture” celebrates firefighting instead of prevention.
- Some advocate devs being on call so “pain lives where it can be fixed”; others call this punitive and say it’s fundamentally a leadership problem.
Definitions and experiences of on-call
- Several say the article confuses “on-duty/support” with true incident on-call; their SRE roles handle rare emergencies, not constant grunt work.
- Experiences range widely: humane rotations (e.g., 1 week per quarter with rest/comp/two time zones) vs horror stories of 24/7/365, 10‑minute response windows, and inability to travel, drink, or plan life.
- Many emphasize the mental burden of potential work, not the actual number of pages.
Necessity vs alternatives
- Some claim on-call is a “necessary evil” for 24×7 services; others argue serious services should use staffed shifts or follow‑the‑sun SRE, not wake sleeping devs.
- There’s disagreement over whether most SaaS truly needs 3 a.m. fixes; critics argue much of this is self‑inflicted by constant, under‑tested change.
- One camp promotes “devs own ops” (no separate ops team); another insists dev and managerial/ops roles must be separated to avoid burnout.
Compensation, law, and unions
- Practices vary: no extra pay, per‑incident pay, per‑shift stipends, overtime rates, or time‑off‑in‑lieu plus stipends.
- Some argue on‑call should clearly be counted as working time when it heavily restricts personal life; EU/California examples are cited.
- Unionization is discussed as a way to negotiate fair pay or limits; others express skepticism based on negative union anecdotes.
Critiques of the article and tooling
- Multiple readers feel the article overgeneralizes from one “big tech” (likely AWS) experience and doubles as consultancy/LLM-tool marketing.
- On-call automation with LLMs and runbooks is mentioned; responders are skeptical that this replaces real SRE judgment, especially with messy ticket data.