Breaking Up with On-Call

Image choice and symbolism

  • Several note the article’s “guard tower” photo appears to be from Manzanar, a WWII Japanese-American internment camp, calling it a poor and insensitive metaphor for on-call.
  • Some argue intent was likely innocent (“grabbed from Google Images”) and changing it is a low‑stakes courtesy.
  • Others push back, seeing complaints as moral posturing and questioning who is actually harmed; debate touches on triggers vs exposure, and whether avoiding such images helps.
  • Multiple people add that on‑call is more like firefighters/EMTs than prison guards, so the metaphor is wrong even aside from history.

Incentives, culture, and responsibility

  • Strong theme: on‑call pain is often inversely related to incentive alignment. When engineers (or the org) feel real consequences, they reduce incidents and treat alerts as tech debt.
  • Many complain that management prioritizes features over reliability; ops and SREs lack authority to fix root causes; “hero culture” celebrates firefighting instead of prevention.
  • Some advocate devs being on call so “pain lives where it can be fixed”; others call this punitive and say it’s fundamentally a leadership problem.

Definitions and experiences of on-call

  • Several say the article confuses “on-duty/support” with true incident on-call; their SRE roles handle rare emergencies, not constant grunt work.
  • Experiences range widely: humane rotations (e.g., 1 week per quarter with rest/comp/two time zones) vs horror stories of 24/7/365, 10‑minute response windows, and inability to travel, drink, or plan life.
  • Many emphasize the mental burden of potential work, not the actual number of pages.

Necessity vs alternatives

  • Some claim on-call is a “necessary evil” for 24×7 services; others argue serious services should use staffed shifts or follow‑the‑sun SRE, not wake sleeping devs.
  • There’s disagreement over whether most SaaS truly needs 3 a.m. fixes; critics argue much of this is self‑inflicted by constant, under‑tested change.
  • One camp promotes “devs own ops” (no separate ops team); another insists dev and managerial/ops roles must be separated to avoid burnout.

Compensation, law, and unions

  • Practices vary: no extra pay, per‑incident pay, per‑shift stipends, overtime rates, or time‑off‑in‑lieu plus stipends.
  • Some argue on‑call should clearly be counted as working time when it heavily restricts personal life; EU/California examples are cited.
  • Unionization is discussed as a way to negotiate fair pay or limits; others express skepticism based on negative union anecdotes.

Critiques of the article and tooling

  • Multiple readers feel the article overgeneralizes from one “big tech” (likely AWS) experience and doubles as consultancy/LLM-tool marketing.
  • On-call automation with LLMs and runbooks is mentioned; responders are skeptical that this replaces real SRE judgment, especially with messy ticket data.