GitHub experience various partial-outages/degradations

Azure outage and root cause

  • Multiple comments link GitHub’s partial outage to an ongoing Azure incident affecting VM management operations (create/update/scale/start/stop) across several regions.
  • Azure’s own status cites a misconfiguration: a change to storage account ACLs hosting VM extensions broke public access, impacting Azure DevOps, GitHub, and others.
  • Users report GitHub Actions failing, self‑hosted runners unable to scale, and jobs stuck in queues while minutes continue to be consumed.

GitHub reliability and Azure migration concerns

  • Several see this as part of a broader pattern: “monthly” or even “daily” GitHub incidents, with January cited as having an incident count roughly equal to the number of days.
  • Some argue that shifting blame to “our upstream provider” is disingenuous since both GitHub and Azure are within the same parent company.
  • There’s frustration that GitHub has become less reliable since deeper Azure integration, and doubts that Microsoft leadership treats GitHub’s reliability as a priority.

Cloud capacity, quotas, and the “infinite” myth

  • Multiple complaints about Azure VM quotas and capacity: multi‑month waits for small quota increases, needing to migrate regions due to lack of hardware, and repeated VM‑ops issues.
  • Others note AWS has similar capacity and quota problems, just often less visible; instance types and AZ pools can be exhausted.
  • Discussion highlights that cloud is not actually infinite; it’s still finite hardware with opaque limits and sometimes slow or denied increases.
  • One thread explains why organizations still choose cloud: compliance, observability, PaaS (managed AD/Entra, SQL, web hosting), and serverless removing ops burden for small teams.

Multi-region and control plane resilience

  • Criticism that Azure continues to have faults spanning multiple regions, especially in the VM control plane.
  • Commenters contrast architectural approaches among hyperscalers and note that all share a vulnerability: a control-plane outage can break scaling and lifecycle operations even if running VMs stay up.
  • For true resilience, some argue you must pre‑allocate capacity and avoid relying on autoscaling—making cloud feel closer to owning hardware.

Alternatives and self-hosting

  • Suggestions include moving to other forges or at least maintaining a bare mirror to ride out GitHub outages.
  • GitLab is seen as less appealing after price/plan changes; some praise Codeberg and self‑hosted Forgejo/Gitea as closer to “old GitHub.”
  • There’s concern about open source projects’ dependence on a single corporate host and what happens if free hosting is reduced or withdrawn.

AI, communication, and status handling

  • Several jokes blame AI (Copilot, agents) for configuration mishaps and outages, and quip that Copilot being down might improve code quality.
  • Users complain that GitHub’s status page often lags reality; they use Hacker News as a “sanity check” when jobs silently stall.
  • Some ask whether paid users will be credited for wasted GitHub Actions minutes during these incidents.