Docker Hub Is Down
Impact and Single Point of Failure Realization
- Many discovered Docker Hub as an unexpected single point of failure (SPOF): dev envs wouldn’t boot, CI builds failed, and PaaS tools (e.g. Coolify) couldn’t deploy or even restart containers.
- Some noted they had base images locally, but Docker still failed builds due to metadata HEAD requests to Docker Hub, even with flags like
--pull=never. - Status page framed it as an authentication issue, but users saw public
docker pulleffectively down for many images.
Workarounds During the Outage
- Directly restarting existing containers via
docker restartbypassed platform tooling that insists on re-pulling images. - People pushed images from nodes that still had them cached into internal registries as an emergency mirror.
- Some resorted to hacks (e.g., changing
FROM golang:…to an available base likeredis:…and installing tooling manually).
Mirrors, Caches, and Alternative Registries
- Strong consensus: run a local / internal registry mirror or pull-through cache for Docker Hub (Harbor, Artifactory, Nexus, AWS ECR, GitLab/GitHub registries, container-registry.com, etc.).
- Several mention AWS’s public ECR mirror of Docker Hub (
public.ecr.aws/docker/library/...), usable by anyone (with potential rate limits off-AWS). - Google Artifact Registry’s pull-through cache also failed, apparently because it tries to validate tags with Docker Hub before serving cached content.
- Kubernetes-focused solutions discussed: Harbor as transparent mirror via
registries.conf, Spegel, kube-image-keeper, local Zot-based mirrors, and other “mirror everything” setups for Docker, npm, PyPI, CPAN, etc.
Registry Alternatives & Tradeoffs
- Alternatives cited: GitHub Container Registry, Quay.io, cloud vendor registries (ECR, Azure, GCP), Harbor-based hosted services.
- Critiques: GHCR auth using deprecated personal access tokens; Quay.io perceived as less reliable by some.
- Several note that moving to another cloud registry just changes the SPOF; the real fix is internal mirroring and pushing all production images to an internal registry.
Reliability and Lessons Learned
- Mixed views: some say Docker Hub is usually very stable; others find a multi-hour outage surprisingly long for such a critical service.
- The outage prompted multiple teams to finally implement pull-through caching and move images off Docker Hub in their pipelines.