Claude.ai unavailable and elevated errors on the API

Service reliability and uptime

  • Multiple users report frequent Claude/Claude Code outages, rate limits, and authentication failures, especially during US work hours.
  • Uptime over 90 days is described as “a single nine,” with jokes about “nine fives” and uptime graphs.
  • Some see more generous limits and better performance late at night; workday usage is harder.
  • Status page and user errors suggest recurring issues with authentication rather than just GPU capacity.

Impact on users and businesses

  • Many depend on Claude daily for work; outages cause significant disruption and push people to other tools.
  • Some business users report spending ~$200k/month or “dozens of engineer salaries” on Anthropic, and describe executives as angry about reliability and support.
  • A number of long‑term heavy users say recent months brought clear regressions in stability and quality, prompting them to switch to Codex or others despite previously being very satisfied.

Debate on root causes and engineering

  • Some argue inference for stateless LLM calls is straightforward; the hard parts are context caching, tools, web access, and fragile external services.
  • Others emphasize broader ops complexity: rate limiting, dead GPUs, multi‑region, monitoring, auth, billing, compliance, and rapid scale.
  • There is speculation that rapid feature shipping and growth are being prioritized over SRE rigor and safe deployment practices.
  • Calls for detailed public postmortems are made; current system is viewed as a “black box” that sometimes breaks.

Alternatives, self‑hosting, and multi‑model setups

  • Many mention falling back to Codex, Gemini, GitHub Copilot, Zo, open‑source models (Qwen, Kimi, DeepSeek), or using Claude via AWS Bedrock or Google instead of Anthropic’s API directly.
  • Some advocate multi‑model, multi‑provider tooling to ride out individual vendor outages.
  • Several teams describe running open models on their own H100 clusters to gain control, privacy, and avoid vendor downtime, though others argue hardware cost, ops burden, and model quality make this unrealistic for many.

Economics, labor, and culture

  • Discussion around ROI: some say LLM spend beats hiring more engineers; others see it as replacing potential hires.
  • Broader worries about layoffs, “replacing labor,” and atrophying coding skills coexist with enthusiasm for big productivity gains.
  • Humor (AI “on lunch break,” uptime songs, Mythos jokes) is used to cope with frustration and highlight how central these tools have become.