Hi all, Tushar from Docker here. We’re sorry about the impact our current outage is having on many of you. Yes, this is related to the ongoing AWS incident and we’re working closely with AWS on getting our services restored. We’ll provide regular updates on dockerstatus.com .
We know how critical Docker Hub and services are to millions of developers, and we’re sorry for the pain this is causing. Thank you for your patience as we work to resolve this incident. We’ll publish a post-mortem in the next few days once this incident is fully resolved and we have a remediation plan.
Part of me hopes that we find out that Dynamo DB (which sounds like was the root of the cascading failures) is shipped in a Docker image which is hosted on Docker Hub :-D
I think more often than not, companies are using a single cloud provider, and even when multiple are used, it's either different projects with different legacy decisions or a conscious migration.
True multi-tenancy is not only very rare, it's an absolute pain to manage as soon as people start using any vendor-specific functionality.
> as soon as people start using any vendor-specific functionality
It's also true in circumstances where things have the same name but act differently.
You'd be forgiven for believing that AWS IAM and GCP IAM are the same thing for example, but in GCP an IAM Role is simply a list of permissions that you can attach to an identity. In AWS an IAM Role is the identity itself.
Other examples; if you're coming from GCP, you'd be forgiven for thinking that Networks are regional in AWS, which will be annoying to fix later when you realise you need to create peering connections.
Oh and while default firewall rules are stateful on both, if you dive into more advanced network security, the way rules are applied and processed can have subtle differences. The inherent global nature of the GCP VPC means firewall rules, by default, apply across all regions within that VPC, which requires a different mindset than AWS where rules are scoped more tightly to the region/subnet.
And even if you think it’s important enough to justify the expense and complexity, it’s times like this when you discover some minor utility service 1) is a critical dependency, and 2) is not multi-cloud.
No, that's pretty rare, and generally means you can't count on any features more sophisticated than VMs and object storage.
On the other hand, it's pretty embarrassing at this point for something as fundamental as Docker to be in a single region. Most cloud providers make inter-region failover reasonably achievable.
A bit like the ambiguity of search facets: if I select one facet, I get results that match, but if I add a second facet, should the results expand (OR'ing my selections) or contract (AND'ing my selections)? Presumably they should be OR'd if they belong to the same category (like selecting multiple colors, if any given result has only one color) but AND'd otherwise (like selecting a color and a size). But then a category could consist of miscellaneous features, and I want results that have every feature I've selected, which goes against the general case.
Because even if service A is using multiple cloud providers not all the external services they use are doing the same thing, especially the smallest one or the cheapest ones. At least one of them is on AWS East-1, fails and degrades service A or takes it down.
Being multi-cloud does not come for free: time, engineers, knowledge and ultimately money.
Multi cloud is not nearly as trivial as often implied to implement for real world complex projects. Things get challenging the second your application steps off the happy path
they are using multiple cloud providers, but judging by the cloudflare r2 outage affecting them earlier this year I guess all of them are on the critical path?
Looking at the landscape around me, no. Everyone is in crisis cost-cutting, "gotta show that same growth the C-suite saw during Covid" mode. So being multi-provider, and even in some cases, being multi-regional, is now off the table. It's sad because the product really suffers. But hey, "growth".
I guess people who are running their own registries like Nexus and build their own container images from a common base image are feeling at least a bit more secure in their choice right now.
Wonder how many builds or redeployments this will break. Personally, nothing against Docker or Docker Hub of course, I find them to be useful.
It's actually an important practice to have a docker image cache in the middle. You never know if an upstream image is purged randomly from docker, and your K8s node gets replaced, and now can't pull the base image for your service.
> You never know if an upstream image is purged randomly from docker, and your K8s node gets replaced, and now can't pull the base image for your service.
That doesn’t make sense unless you have some oddball setup where k8s is building the images you’re running on the fly. Theres no such thing as “base image” for tasks running in k8s. There is just the image itself and its layers which may come from some other image.
But it’s not built by k8s. Its be built in whatever is building your images and storing I. Your registers. That’s where you need your true base image caching.
We are using base images but unfortunately some github actions are pulling docker images in their prepare phase - so while my application would build, I cannot deploy it because the CI/CD depends on dockerhub and you cannot change where these images are pulled from (so they cannot go through a pull-through cache)…
My advice: document the issue, and use it to help justify spending time on removing those vestigial dependencies on Docker asap.
It's not just about reducing your exposure to third parties who you (presumably) don't have a contract with, it's also good mitigation against potential supply chain attacks - especially if you go as far as building the base images from scratch.
Yea we have thought about that - I also want to remove most dependencies on externally imported actions on GitHub CI and probably just go back to simple bash scripts. Our actions are not that complicated and there is little benefit in using some external action to run ESLint than just run the command inside the action directly. Saves time and reduces dependencies - just need to get time to do that…
Hmm yea with a self hosted runner this could work. Gotta need to set the dockerd config into the VM before the runner starts I assume - unfortunately GitHub itself does not allow to change anything for the prepare stage - and it's a known issue for 2 years at least...
We run Harbor and mirror every base image using its Proxy Cache feature, it's quite nice.
We've had this setup for years now and while it works fine, Harbor has some rough edges.
I came here to mention that any non-trivial company depending on Docker images should look into a local proxy cache. It’s too much infra for a solo developer / tiny organization, but is a good hedge against DockerHub, GitHub repo, etc downtime and can run faster (less ingress transfer) if located in the same region as the rest of your infra.
Edit to add: This might spur on a few more to start doing that, but people are quick to forget/prioritise other areas. If this keeps happening then it will change.
Seems related to size and/or maturity if anything. I haven't seen any startups less than five year old doing anything like that, but I also haven't seen any huge enterprise not doing that, YMMV.
Pull-through caches are still useful even when the upstream is down... assuming the image(s) were pulled recently. The HEAD to upstream will obviously fail [when checking currency], but the software is happy to serve what it has already pulled.
Depends on the implementation, of course: I'm speaking to 'distribution/distribution', the reference. Harbor or whatever else may behave differently, I have no idea.
I find that it better surfaces the best discussion when there are multiple threads (like in this example), and it keeps showing slightly older threads for longer when there's still discussion happening.
If it really is fully open-source please make that more visible on your landing page.
It is a huge deal if I can start investigating and deploying such a solution as a techie right away, compared to having to go through all the internal hoops for a software purchase.
How hard is it to go to the GitHub repository and open the LICENSE file that is in almost every repository? Would have taken you less time than writing that comment, and showed you it's under MIT.
It's not entirely uncommon to only have parts of the solution open. So a license on one repo might not be the whole story and looking further would take more time than giving a good suggestion to the author.
Agreed. For all the people arguing "just click the link and the license is there!!", I have been burned several times before where a technical solution has a prominent open permissive license github repo (MIT or similar etc) based component as its primary home, only to discover later on that essential parts of the system are in other less permissive or private repos behind subscriptions or fees.
The rest of us get around that particular issue by going through the source code and all the tradeoffs before we download, include and adopt a dependency, not after.
Good for you! This of course doesn't help in the situation where a dependency author retroactively changes the licensing state of a component, or reconfigures the project to rely on a new external dependency with differing license states (experienced both of these too!).
Having the landing page explain the motivations of the authors vis-a-vis open source goes a long way to providing the context for whatever licensing is appearing in the source repos, and helps understand what the future steer for the project is likely to be.
There are loads of ostensibly open source projects out there whose real goal is to drive sales of associated software and services, often without which the value of the opensource components is reduced, especially in the developer tooling space.
> Good for you! This of course doesn't help in the situation where a dependency author retroactively changes the licensing state of a component, or reconfigures the project to rely on a new external dependency with differing license states (experienced both of these too!).
No, but I also don't see why that matters a lot. Once you adopted a third party project as a dependency, you also implicitly sign up to whatever changes they do, or you get prepared for staying on a static version with only security fixes you apply yourself. This isn't exactly new problems nor rocket science, we've been dealing with these sort of things for decades already.
> There are loads of ostensibly open source projects out there whose real goal is to drive sales of associated software and services, often without which the value of the opensource components is reduced, especially in the developer tooling space.
Yeah, which is kind of terrible, but also kind of great. But in the end, ends up being fairly easy to detect one way or another, with the biggest and reddest signal being VC funded with no public pricing.
If I have to dig through your website/documentation to find basic information we’re not getting off to a great start. It’s pretty common for open source projects to proudly proclaim they are open source from the get-go. “____ is an open source tool for ______.” Simple as that
Seriously all the nitpicking I see of any project people post here but “tell us you’re open source at the top when you’re open source” means we’re lazy? Being open source is an important decision and you should tell people! It’s a good thing!
Isn’t a big part of getting a project out there actually letting people know what it is? Especially if you’re trying to give a tool to the open source-valuing community. That’s a high priority for them. That’s like having a vegan menu and not saying you’re a vegan restaurant anywhere public facing.
I agree it's a good thing, but I'd also agree it's not something you need/have to shove in people's faces, especially when it's literally one click away to find out (The GitHub icon in the top right takes you to the repository, and you don't even have to scroll or click anything, the sidebar shows "MIT License" for you).
There is a GitHub icon fairly prominent on the top right. Choosing to spend precious text for a fleeting would be user on it is a choice and not everyone wants to market that fact very prominently. Should everyone who writes their project in rust include that prominently as well? It seemingly markets very well and a lot of people seem to care about that as well.
It's been a while since I looked at kuik, but I would say the main difference is that Spegel doesn't do any of the pulling or storage of images. Instead it relies on Containerd to do it for you. This also means that Spegel does not have to manage garbage collection. The nice thing with this is that it doesn't change how images are initially pulled from upstream and is able to serve images that exist on the node before Spegel runs.
Also it looks kuik uses CRDs to store information about where images are cached, while Spegel uses its own p2p solution to do the routing of traffic between nodes.
If you are running k3s in your homelab you can enable Spegel with a flag as it is an embedded feature.
There is a couple of alternatives that mirrors more than just Docker Hub too, most of them pretty bloated and enterprisey, but they do what they say on the tin and saved me more than once. Artifactory, Nexus Repository, Cloudsmith and ProGet are some of them.
Spegel does not only mirror Docker Hub, and works a lot differently than the alternatives you suggested. Instead of being yet another failure point closer to your production environment, it runs a distributed stateless registry inside of your Kubernetes cluster. By piggy backing off of Containerds image store it will distribute already pulled images inside of the cluster.
I'll be honest and say I hadn't heard of Spegel before, and just read the landing page which says "Speed up container pulls and minimize downtime with a stateless peer-to-peer OCI registry mirror for efficient image distribution", so it isn't exactly clear you can use it for more things than container images.
Spegel itself does not manage state as a normal registry would. Maybe ephemeral would be a better word to describe it. A normal registry would require some stateful storage solution along with a database to store image that clients push to it. Spegel exploits the fact that images used by containers will be stored on disk by Containerd for its benefit. Any image currently being used by a pod in a cluster will be available for all other nodes in the cluster to pull.
Gotcha. That's definitely an important point and seems difficult to communicate in a single word or quick blurb. I can see why you went with stateless. It's just a little confusing in this context (for me at least).
I am having some discussions about getting things working on GKE but I can't give an ETA as it really depends on how things align with deployment schedules. I am positive however that this will soon be resolved.
I still think this is an acceptable footgun (?) to have. The expressiveness of downloading an image tag with a domain included outweighs potential miscommunication issues.
For example, if you're on a team and you have documentation containing commands, but your docker config is outdated, you can accidentally pull from docker's global public registry.
A welcome change IMO would be removing global registries entirely, since it just makes it easier to tell where your image is coming from (but I severely doubt docker would ever consider this since it makes it fractionally easier to use their services)
Even if you could configure a default registry to point at something besides docker.io a lot of people, I'd say the vast majority, wouldn't have bothered. So they'd still be in the same spot.
And it's not hard to just tag images. I don't have a single image pulling from docker.io at work. Takes two seconds to slap <company-repo>/ at the front of the image name.
Google Container Registry provides a pull-through mirror, though, just prefix `mirror.gcr.io` and use `library` as the user for the Docker Official Images. For example `mirror.gcr.io/library/redis` for https://hub.docker.com/_/redis.
Or they all rely on AWS, because over the last 15 years we've built an extremely fragile interconnected global system in the pursuit of profit, austerity, and efficiency
In terms of user reports: Some users don't know what the hell is going on. This is a constant.
For instance: When there's a widespread Verizon cellular outage, sites like downdetector will show a spike in Verizon reports.
But such sites will also show a spike in AT&T and T-Mobile reports. Even though those latter networks are completely unaffected by Verizon's back-end issues, the graphs of user reports are consistently shaped the same for all 3 carriers.
This is just because some of the users doing the reporting have no clue.
So when the observation is "AWS is in outage and people are reporting issues at Google, and Microsoft," then the last two are often just factors of people being people and reporting the wrong thing.
(You're hanging out on HN, so there's very good certainty that you know what precisely what cell carrier you're using and also can discern the difference betwixt an Amazon, a Google, and a Microsoft. But lots of other people are not particularly adept at making these distinctions. It's normal and expected for some of them to be this way at all times.)
Thats true. And big part of the reason is the user‘s browser. They use Microsoft Edge or Google Chrome and can‘t open a page and there are weird error messages? oh, thats probably a Google issue…
What are good proxy/mirror solutions to mitigate such issues? Best would be an all in one solution that for example also handles nodejs, packigist etc.
Pulp is a popular project for 'one stop shop', I believe. Personally, always used project-specific solutions like 'distribution/distribution' for containers from the CNCF. This allows for pull-through caching with relatively little setup work.
Yes, 1000s of orgs. Larger players might use a pull-through-cache - but it's not as common as it should be. Similar issue for other software-supply-chain (NPM, pyPi, etc)
Hi all, Tushar from Docker here. We’re sorry about the impact our current outage is having on many of you. Yes, this is related to the ongoing AWS incident and we’re working closely with AWS on getting our services restored. We’ll provide regular updates on dockerstatus.com .
We know how critical Docker Hub and services are to millions of developers, and we’re sorry for the pain this is causing. Thank you for your patience as we work to resolve this incident. We’ll publish a post-mortem in the next few days once this incident is fully resolved and we have a remediation plan.
Part of me hopes that we find out that Dynamo DB (which sounds like was the root of the cascading failures) is shipped in a Docker image which is hosted on Docker Hub :-D
pls bring it back
[flagged]
Result of AWS outage https://news.ycombinator.com/item?id=45640754
> We have identified the underlying issue with one of our cloud service providers.
Isn't it everyone using multiple cloud providers nowadays? Why are they affected by single cloud provider outage?
I think more often than not, companies are using a single cloud provider, and even when multiple are used, it's either different projects with different legacy decisions or a conscious migration.
True multi-tenancy is not only very rare, it's an absolute pain to manage as soon as people start using any vendor-specific functionality.
> as soon as people start using any vendor-specific functionality
It's also true in circumstances where things have the same name but act differently.
You'd be forgiven for believing that AWS IAM and GCP IAM are the same thing for example, but in GCP an IAM Role is simply a list of permissions that you can attach to an identity. In AWS an IAM Role is the identity itself.
Other examples; if you're coming from GCP, you'd be forgiven for thinking that Networks are regional in AWS, which will be annoying to fix later when you realise you need to create peering connections.
Oh and while default firewall rules are stateful on both, if you dive into more advanced network security, the way rules are applied and processed can have subtle differences. The inherent global nature of the GCP VPC means firewall rules, by default, apply across all regions within that VPC, which requires a different mindset than AWS where rules are scoped more tightly to the region/subnet.
There's like, hundreds of these little details.
Sounds like we’ve walked a similar path on this. Especially with IAM and network policies.
> There’s like hundreds of these little issues
Exactly. If it is a handful of things that is fine. It’s often as you describe.
I think there's some irony in Docker being impacted specifically, as they're one of the main tools to help achieve true multi-tenancy.
Depends on if you’re using Docker or Podman Desktop versus straight Docker/Podman and where you’re pulling your images from.
And even if you think it’s important enough to justify the expense and complexity, it’s times like this when you discover some minor utility service 1) is a critical dependency, and 2) is not multi-cloud.
Complex systems are hard.
Multi cloud is just a way to have the outages of both.
No, that's pretty rare, and generally means you can't count on any features more sophisticated than VMs and object storage.
On the other hand, it's pretty embarrassing at this point for something as fundamental as Docker to be in a single region. Most cloud providers make inter-region failover reasonably achievable.
Almost all cloud providers help here by having inter-region failures as well.
There are multiple AWS services which are "global" in the sense that they are entirely hosted out of AWS East 1
You can be multi-cloud in the sense that you aren't dependent on any single provider, or in the sense that you are dependent on all of them.
A bit like the ambiguity of search facets: if I select one facet, I get results that match, but if I add a second facet, should the results expand (OR'ing my selections) or contract (AND'ing my selections)? Presumably they should be OR'd if they belong to the same category (like selecting multiple colors, if any given result has only one color) but AND'd otherwise (like selecting a color and a size). But then a category could consist of miscellaneous features, and I want results that have every feature I've selected, which goes against the general case.
Not only they are not using multiple cloud providers, they are not using multiple cloud locations.
Because it's hard enough to distribute a service across multiple machines in the same DC, let alone across multiple DCs and multiple providers.
Because even if service A is using multiple cloud providers not all the external services they use are doing the same thing, especially the smallest one or the cheapest ones. At least one of them is on AWS East-1, fails and degrades service A or takes it down.
Being multi-cloud does not come for free: time, engineers, knowledge and ultimately money.
Multi cloud is not nearly as trivial as often implied to implement for real world complex projects. Things get challenging the second your application steps off the happy path
> Isn't it everyone using multiple cloud providers nowadays? Why are they affected by single cloud provider outage?
No? I very much doubt anyone is doing that.
> Isn't it everyone using multiple cloud providers nowadays?
Oh yes. All of them, in fact, especially if you count what key vendors host on.
> Why are they affected by single cloud provider outage?
Every workload is only on one cloud. Nb this doesn’t mean every workflow is on only one cloud. Important distinction since that would be more stable.
they are using multiple cloud providers, but judging by the cloudflare r2 outage affecting them earlier this year I guess all of them are on the critical path?
Looking at the landscape around me, no. Everyone is in crisis cost-cutting, "gotta show that same growth the C-suite saw during Covid" mode. So being multi-provider, and even in some cases, being multi-regional, is now off the table. It's sad because the product really suffers. But hey, "growth".
[dead]
This broke our builds since we rely on several public Docker images, and by default, Docker uses docker.io.
Thankfully, AWS provides a docker.io mirror for those who can't wait:
In the error logs, the issue was mostly related to the authentication endpoint:▪ https://auth.docker.io → "No server is available to handle this request"
After switching to the AWS mirror, everything built successfully without any issues.
Mild irony that Docker is down because of the AWS outage, but the AWS mirror repos are still running...
Also, docker.io is rate-limited, so if your organization experiences enough growth you will start seeing build failures on a regular basis.
Also, quay.io - another image hoster, from red hat - has been read-only all day today.
If you're going to have docker/container image dependencies it's best to establish a solid hosting solution instead of riding whatever bus shows up
Rate limits are primarily applied to unauthenticated users, open source projects and business accounts have none/much higher tresholds
based on the solution, it seems like it is quite straightforward to switchover
I wasn't able to get this working, but I was able to use Google's mirror[0] just fine.
Just had to change
to Hope this helps![0]: https://cloud.google.com/artifact-registry/docs/pull-cached-...
We tried this initially
We received So it looks like these services may not be true mirrors, and just functioning as a library proxy with a cache.If you're image is not cached on one of these then you may be SOL.
During the last Docker Hub outage we found Google mirrors lost all image tags after a while. Image digest references would probably work
public.ecr.aws was failing for me earlier with 5XX errors due to the AWS outage: https://news.ycombinator.com/item?id=45640754
I manage a large build system and pulling from ECR has been flaking all day
I guess people who are running their own registries like Nexus and build their own container images from a common base image are feeling at least a bit more secure in their choice right now.
Wonder how many builds or redeployments this will break. Personally, nothing against Docker or Docker Hub of course, I find them to be useful.
It's actually an important practice to have a docker image cache in the middle. You never know if an upstream image is purged randomly from docker, and your K8s node gets replaced, and now can't pull the base image for your service.
Just engineering hygiene IMO.
> You never know if an upstream image is purged randomly from docker, and your K8s node gets replaced, and now can't pull the base image for your service.
That doesn’t make sense unless you have some oddball setup where k8s is building the images you’re running on the fly. Theres no such thing as “base image” for tasks running in k8s. There is just the image itself and its layers which may come from some other image.
But it’s not built by k8s. Its be built in whatever is building your images and storing I. Your registers. That’s where you need your true base image caching.
We are using base images but unfortunately some github actions are pulling docker images in their prepare phase - so while my application would build, I cannot deploy it because the CI/CD depends on dockerhub and you cannot change where these images are pulled from (so they cannot go through a pull-through cache)…
My advice: document the issue, and use it to help justify spending time on removing those vestigial dependencies on Docker asap.
It's not just about reducing your exposure to third parties who you (presumably) don't have a contract with, it's also good mitigation against potential supply chain attacks - especially if you go as far as building the base images from scratch.
Yea we have thought about that - I also want to remove most dependencies on externally imported actions on GitHub CI and probably just go back to simple bash scripts. Our actions are not that complicated and there is little benefit in using some external action to run ESLint than just run the command inside the action directly. Saves time and reduces dependencies - just need to get time to do that…
mirrors can be configured in dockerd or buildkit. if you can update the config (might need a self-hosted runner?) it’s a quick fix - see https://cloud.google.com/artifact-registry/docs/pull-cached-... for an example. aws and azure are similar.
Hmm yea with a self hosted runner this could work. Gotta need to set the dockerd config into the VM before the runner starts I assume - unfortunately GitHub itself does not allow to change anything for the prepare stage - and it's a known issue for 2 years at least...
https://github.com/actions/runner-images/issues/1445#issueco... https://github.com/orgs/community/discussions/76636
We run Harbor and mirror every base image using its Proxy Cache feature, it's quite nice. We've had this setup for years now and while it works fine, Harbor has some rough edges.
I came here to mention that any non-trivial company depending on Docker images should look into a local proxy cache. It’s too much infra for a solo developer / tiny organization, but is a good hedge against DockerHub, GitHub repo, etc downtime and can run faster (less ingress transfer) if located in the same region as the rest of your infra.
That is nothing compared to how good i feel about not using containers at all.
You don’t want a Rube Goldberg contraption doing everything?
So not agile!
Currently unable to do much of anything new in dev/prod environments without manual workarounds. I'd imagine the impact is pretty massive.
Asside; seems Signal is also having issues. Damn.
I’m not sure that the impact will be that big. Most organizations have their own mirrors for artifacts.
From what I've seen: I highly doubt it.
Edit to add: This might spur on a few more to start doing that, but people are quick to forget/prioritise other areas. If this keeps happening then it will change.
“Their own” can and often does mean something hosted on a major cloud provider (whether they manage it in-house or pay a vendor for their system)
Yeah, perhaps. I don't know how many folks host mirrors. Most places I've worked for didn't, though this is anecdotal.
I would say most people would say it't best practice while a minority actually does it.
Seems related to size and/or maturity if anything. I haven't seen any startups less than five year old doing anything like that, but I also haven't seen any huge enterprise not doing that, YMMV.
Yes I noticed Signal being down too
Guess where we host nexus..
Only if they get their base images from somewhere else...
Pull-through caches are still useful even when the upstream is down... assuming the image(s) were pulled recently. The HEAD to upstream will obviously fail [when checking currency], but the software is happy to serve what it has already pulled.
Depends on the implementation, of course: I'm speaking to 'distribution/distribution', the reference. Harbor or whatever else may behave differently, I have no idea.
It's quite funny/interesting that this is higher in HN front page than the news of the AWS outage that caused it.
Not on the real secret front page! https://news.ycombinator.com/active :)
That's informative, I wasn't aware of that way to view HN, thanks.
What does the "active" page sort by?
According to https://news.ycombinator.com/lists it's "Most active current discussions"
I find that it better surfaces the best discussion when there are multiple threads (like in this example), and it keeps showing slightly older threads for longer when there's still discussion happening.
Shameless plug but this might be a good time to install Spegel in your Kubernetes clusters if you have critical dependencies on Docker Hub.
https://spegel.dev/
If it really is fully open-source please make that more visible on your landing page.
It is a huge deal if I can start investigating and deploying such a solution as a techie right away, compared to having to go through all the internal hoops for a software purchase.
How hard is it to go to the GitHub repository and open the LICENSE file that is in almost every repository? Would have taken you less time than writing that comment, and showed you it's under MIT.
It's not entirely uncommon to only have parts of the solution open. So a license on one repo might not be the whole story and looking further would take more time than giving a good suggestion to the author.
Agreed. For all the people arguing "just click the link and the license is there!!", I have been burned several times before where a technical solution has a prominent open permissive license github repo (MIT or similar etc) based component as its primary home, only to discover later on that essential parts of the system are in other less permissive or private repos behind subscriptions or fees.
The rest of us get around that particular issue by going through the source code and all the tradeoffs before we download, include and adopt a dependency, not after.
Good for you! This of course doesn't help in the situation where a dependency author retroactively changes the licensing state of a component, or reconfigures the project to rely on a new external dependency with differing license states (experienced both of these too!).
Having the landing page explain the motivations of the authors vis-a-vis open source goes a long way to providing the context for whatever licensing is appearing in the source repos, and helps understand what the future steer for the project is likely to be.
There are loads of ostensibly open source projects out there whose real goal is to drive sales of associated software and services, often without which the value of the opensource components is reduced, especially in the developer tooling space.
> Good for you! This of course doesn't help in the situation where a dependency author retroactively changes the licensing state of a component, or reconfigures the project to rely on a new external dependency with differing license states (experienced both of these too!).
No, but I also don't see why that matters a lot. Once you adopted a third party project as a dependency, you also implicitly sign up to whatever changes they do, or you get prepared for staying on a static version with only security fixes you apply yourself. This isn't exactly new problems nor rocket science, we've been dealing with these sort of things for decades already.
> There are loads of ostensibly open source projects out there whose real goal is to drive sales of associated software and services, often without which the value of the opensource components is reduced, especially in the developer tooling space.
Yeah, which is kind of terrible, but also kind of great. But in the end, ends up being fairly easy to detect one way or another, with the biggest and reddest signal being VC funded with no public pricing.
Also it's good feedback for the developer of this solution
If I have to dig through your website/documentation to find basic information we’re not getting off to a great start. It’s pretty common for open source projects to proudly proclaim they are open source from the get-go. “____ is an open source tool for ______.” Simple as that
Today's kids are way too lazy.
Seriously all the nitpicking I see of any project people post here but “tell us you’re open source at the top when you’re open source” means we’re lazy? Being open source is an important decision and you should tell people! It’s a good thing!
Isn’t a big part of getting a project out there actually letting people know what it is? Especially if you’re trying to give a tool to the open source-valuing community. That’s a high priority for them. That’s like having a vegan menu and not saying you’re a vegan restaurant anywhere public facing.
I agree it's a good thing, but I'd also agree it's not something you need/have to shove in people's faces, especially when it's literally one click away to find out (The GitHub icon in the top right takes you to the repository, and you don't even have to scroll or click anything, the sidebar shows "MIT License" for you).
> I'd also agree it's not something you need/have to shove in people's faces
Agree to disagree. It should be front and center the moment I find your tool IMO.
There is a GitHub icon fairly prominent on the top right. Choosing to spend precious text for a fleeting would be user on it is a choice and not everyone wants to market that fact very prominently. Should everyone who writes their project in rust include that prominently as well? It seemingly markets very well and a lot of people seem to care about that as well.
After some digging - https://github.com/spegel-org/spegel/blob/main/LICENSE says MIT
https://spegel.dev/project/community/
What's the difference with kuik? Spegel seems too complicated for my homelab, but could be a nice upgrade for my company
Kuik: https://github.com/enix/kube-image-keeper?tab=readme-ov-file...
It's been a while since I looked at kuik, but I would say the main difference is that Spegel doesn't do any of the pulling or storage of images. Instead it relies on Containerd to do it for you. This also means that Spegel does not have to manage garbage collection. The nice thing with this is that it doesn't change how images are initially pulled from upstream and is able to serve images that exist on the node before Spegel runs.
Also it looks kuik uses CRDs to store information about where images are cached, while Spegel uses its own p2p solution to do the routing of traffic between nodes.
If you are running k3s in your homelab you can enable Spegel with a flag as it is an embedded feature.
There is a couple of alternatives that mirrors more than just Docker Hub too, most of them pretty bloated and enterprisey, but they do what they say on the tin and saved me more than once. Artifactory, Nexus Repository, Cloudsmith and ProGet are some of them.
Spegel does not only mirror Docker Hub, and works a lot differently than the alternatives you suggested. Instead of being yet another failure point closer to your production environment, it runs a distributed stateless registry inside of your Kubernetes cluster. By piggy backing off of Containerds image store it will distribute already pulled images inside of the cluster.
I'll be honest and say I hadn't heard of Spegel before, and just read the landing page which says "Speed up container pulls and minimize downtime with a stateless peer-to-peer OCI registry mirror for efficient image distribution", so it isn't exactly clear you can use it for more things than container images.
What exactly does "stateless" mean in this context?
Spegel itself does not manage state as a normal registry would. Maybe ephemeral would be a better word to describe it. A normal registry would require some stateful storage solution along with a database to store image that clients push to it. Spegel exploits the fact that images used by containers will be stored on disk by Containerd for its benefit. Any image currently being used by a pod in a cluster will be available for all other nodes in the cluster to pull.
Gotcha. That's definitely an important point and seems difficult to communicate in a single word or quick blurb. I can see why you went with stateless. It's just a little confusing in this context (for me at least).
This looks good, but we're using GKE and it looks like it only works there with some hacks. Is there a timeline to make it work with GKE properly?
I am having some discussions about getting things working on GKE but I can't give an ETA as it really depends on how things align with deployment schedules. I am positive however that this will soon be resolved.
Thanks. I will keep an eye on your project as it looks great and something we would definitely benefit from.
P.S. Your blog could do with an rss feed ;). I will track https://github.com/spegel-org/spegel/releases.atom for now
Google Cloud has its own cache of Docker Hub that you can use for free, AWS does as well
Our images are in a private docker registry on quay.io
I wonder if this is why I also can't log in to O'Reilly to do some "Docker is down, better find something to do" training...
Just install a pull-through proxy that will store all the packages recently used.
this is by design
docker got requests to allow you to configure a private registry, but they selfishly denied the ability to do that:
https://stackoverflow.com/questions/33054369/how-to-change-t...
redhat created docker-compatible podman and lets you close that hole
/etc/config/docker: BLOCK_REGISTRY='--block-registry=all' ADD_REGISTRY='--add-registry=registry.access.redhat.com'
I still think this is an acceptable footgun (?) to have. The expressiveness of downloading an image tag with a domain included outweighs potential miscommunication issues.
For example, if you're on a team and you have documentation containing commands, but your docker config is outdated, you can accidentally pull from docker's global public registry.
A welcome change IMO would be removing global registries entirely, since it just makes it easier to tell where your image is coming from (but I severely doubt docker would ever consider this since it makes it fractionally easier to use their services)
This is a huge stretch.
Even if you could configure a default registry to point at something besides docker.io a lot of people, I'd say the vast majority, wouldn't have bothered. So they'd still be in the same spot.
And it's not hard to just tag images. I don't have a single image pulling from docker.io at work. Takes two seconds to slap <company-repo>/ at the front of the image name.
Sadly doesn't help if you were using ECR in us-east-1 as your private registry. :(
For other people impacted, what helped me this morning was to use the `ghcr`, albeit this is not a one-to-one replacement.
Ex: `docker pull ghcr.io/linuxcontainers/debian-slim:latest`
That image is over one year old: https://github.com/linuxcontainers/debian-slim/pkgs/containe...
Google Container Registry provides a pull-through mirror, though, just prefix `mirror.gcr.io` and use `library` as the user for the Docker Official Images. For example `mirror.gcr.io/library/redis` for https://hub.docker.com/_/redis.
Is there a way to configure alternate mirrors in containerd?
Recovering as of October 20, 2025 09:43 UTC
> [Monitoring] We are seeing error rates recovering across our SaaS services. We continue to monitor as we process our backlog.
Does it decrease the AWS's nine 9s ?
The marketing department did the maths and they said no.
"MOST of the time" we're nine 9s.
So thus far today outages are reported from
- AWS
- Vercel
- Atlassian
- Cloudflare
- Docker
- Google (see downdetector)
- Microsoft (see downdetector)
What's going on?
Or they all rely on AWS, because over the last 15 years we've built an extremely fragile interconnected global system in the pursuit of profit, austerity, and efficiency
Wait, Google and Microsoft rely on AWS? That seems unlikely? (does it? I wouldn't really know to be honest)
In terms of user reports: Some users don't know what the hell is going on. This is a constant.
For instance: When there's a widespread Verizon cellular outage, sites like downdetector will show a spike in Verizon reports.
But such sites will also show a spike in AT&T and T-Mobile reports. Even though those latter networks are completely unaffected by Verizon's back-end issues, the graphs of user reports are consistently shaped the same for all 3 carriers.
This is just because some of the users doing the reporting have no clue.
So when the observation is "AWS is in outage and people are reporting issues at Google, and Microsoft," then the last two are often just factors of people being people and reporting the wrong thing.
(You're hanging out on HN, so there's very good certainty that you know what precisely what cell carrier you're using and also can discern the difference betwixt an Amazon, a Google, and a Microsoft. But lots of other people are not particularly adept at making these distinctions. It's normal and expected for some of them to be this way at all times.)
Thats true. And big part of the reason is the user‘s browser. They use Microsoft Edge or Google Chrome and can‘t open a page and there are weird error messages? oh, thats probably a Google issue…
More likely the outage reports for google and microsoft are based around systems which also include aws
It’s very likely they’ve bought companies that were built on AWS and haven’t migrated to use their homegrown cloud platforms.
They might be using third party services that rely on AWS.
Reddit appears to be only semi operational. Frequent “rate limit” errors and empty pares while just browsing. Not sure if related
dns outage at aws exposing how overly centralized our infra is
The new left-pad
https://www.bbc.com/news/live/c5y8k7k6v1rt
The internet was designed to be fault tolerant and distributed from the beginning and we still ended up with a handful of mega hosts.
Its impressive that even though registry-1.docker.io returned 503 errors they where able to keep a the metric "Docker Registry Uptime" at 100%.
Well, the server was up, it was just returning HTTP 503...
even reddit throws a lot of 503s when adding/editing comments
reddit is always going down, thats the least surprising thing about this
What are good proxy/mirror solutions to mitigate such issues? Best would be an all in one solution that for example also handles nodejs, packigist etc.
Pulp is a popular project for 'one stop shop', I believe. Personally, always used project-specific solutions like 'distribution/distribution' for containers from the CNCF. This allows for pull-through caching with relatively little setup work.
I'm fairly new to Docker. Do folks really rely on public images and registries for production systems? Seems like a brittle strategy.
Yes, 1000s of orgs. Larger players might use a pull-through-cache - but it's not as common as it should be. Similar issue for other software-supply-chain (NPM, pyPi, etc)
Is there a built-in way to bypass the request to the registry if your base layers are cached?
pull: never?
...well this explains a lot about how my morning is going...
what good options are there for container registry proxies / caches to protect against something like this?
https://docs.docker.com/docker-hub/image-library/mirror/ ?
I build Spegel to keep my Kubernetes cluster running smoothly during an outage like this. https://spegel.dev/
mirror.gcr.io is your friend