GitHub experience various partial-outages/degradations
Azure outage and root cause
- Multiple comments link GitHub’s partial outage to an ongoing Azure incident affecting VM management operations (create/update/scale/start/stop) across several regions.
- Azure’s own status cites a misconfiguration: a change to storage account ACLs hosting VM extensions broke public access, impacting Azure DevOps, GitHub, and others.
- Users report GitHub Actions failing, self‑hosted runners unable to scale, and jobs stuck in queues while minutes continue to be consumed.
GitHub reliability and Azure migration concerns
- Several see this as part of a broader pattern: “monthly” or even “daily” GitHub incidents, with January cited as having an incident count roughly equal to the number of days.
- Some argue that shifting blame to “our upstream provider” is disingenuous since both GitHub and Azure are within the same parent company.
- There’s frustration that GitHub has become less reliable since deeper Azure integration, and doubts that Microsoft leadership treats GitHub’s reliability as a priority.
Cloud capacity, quotas, and the “infinite” myth
- Multiple complaints about Azure VM quotas and capacity: multi‑month waits for small quota increases, needing to migrate regions due to lack of hardware, and repeated VM‑ops issues.
- Others note AWS has similar capacity and quota problems, just often less visible; instance types and AZ pools can be exhausted.
- Discussion highlights that cloud is not actually infinite; it’s still finite hardware with opaque limits and sometimes slow or denied increases.
- One thread explains why organizations still choose cloud: compliance, observability, PaaS (managed AD/Entra, SQL, web hosting), and serverless removing ops burden for small teams.
Multi-region and control plane resilience
- Criticism that Azure continues to have faults spanning multiple regions, especially in the VM control plane.
- Commenters contrast architectural approaches among hyperscalers and note that all share a vulnerability: a control-plane outage can break scaling and lifecycle operations even if running VMs stay up.
- For true resilience, some argue you must pre‑allocate capacity and avoid relying on autoscaling—making cloud feel closer to owning hardware.
Alternatives and self-hosting
- Suggestions include moving to other forges or at least maintaining a bare mirror to ride out GitHub outages.
- GitLab is seen as less appealing after price/plan changes; some praise Codeberg and self‑hosted Forgejo/Gitea as closer to “old GitHub.”
- There’s concern about open source projects’ dependence on a single corporate host and what happens if free hosting is reduced or withdrawn.
AI, communication, and status handling
- Several jokes blame AI (Copilot, agents) for configuration mishaps and outages, and quip that Copilot being down might improve code quality.
- Users complain that GitHub’s status page often lags reality; they use Hacker News as a “sanity check” when jobs silently stall.
- Some ask whether paid users will be credited for wasted GitHub Actions minutes during these incidents.