Docker Hub Is Down

Impact and Single Point of Failure Realization

  • Many discovered Docker Hub as an unexpected single point of failure (SPOF): dev envs wouldn’t boot, CI builds failed, and PaaS tools (e.g. Coolify) couldn’t deploy or even restart containers.
  • Some noted they had base images locally, but Docker still failed builds due to metadata HEAD requests to Docker Hub, even with flags like --pull=never.
  • Status page framed it as an authentication issue, but users saw public docker pull effectively down for many images.

Workarounds During the Outage

  • Directly restarting existing containers via docker restart bypassed platform tooling that insists on re-pulling images.
  • People pushed images from nodes that still had them cached into internal registries as an emergency mirror.
  • Some resorted to hacks (e.g., changing FROM golang:… to an available base like redis:… and installing tooling manually).

Mirrors, Caches, and Alternative Registries

  • Strong consensus: run a local / internal registry mirror or pull-through cache for Docker Hub (Harbor, Artifactory, Nexus, AWS ECR, GitLab/GitHub registries, container-registry.com, etc.).
  • Several mention AWS’s public ECR mirror of Docker Hub (public.ecr.aws/docker/library/...), usable by anyone (with potential rate limits off-AWS).
  • Google Artifact Registry’s pull-through cache also failed, apparently because it tries to validate tags with Docker Hub before serving cached content.
  • Kubernetes-focused solutions discussed: Harbor as transparent mirror via registries.conf, Spegel, kube-image-keeper, local Zot-based mirrors, and other “mirror everything” setups for Docker, npm, PyPI, CPAN, etc.

Registry Alternatives & Tradeoffs

  • Alternatives cited: GitHub Container Registry, Quay.io, cloud vendor registries (ECR, Azure, GCP), Harbor-based hosted services.
  • Critiques: GHCR auth using deprecated personal access tokens; Quay.io perceived as less reliable by some.
  • Several note that moving to another cloud registry just changes the SPOF; the real fix is internal mirroring and pushing all production images to an internal registry.

Reliability and Lessons Learned

  • Mixed views: some say Docker Hub is usually very stable; others find a multi-hour outage surprisingly long for such a critical service.
  • The outage prompted multiple teams to finally implement pull-through caching and move images off Docker Hub in their pipelines.