-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docker pull failed with connection reset by peer
or i/o timeout
#144
Comments
One thing we notice is all the failed events for last 7 days are related to ip And we noticed there are similar networking issue mentioned this ip address, not sure if they are related : |
Having the same issue with a kubernetes cluster hosted in AWS(us-east-1 also) at the moment of building the container.
During last week was more ocasional that a pipeline would fail, but the last two days in my case has become almost permanent |
We are seeing similar issues on the last few days. Seems to fail maybe 25% of the time, so could be so load balanced node behind the ip that isat issue and if you just happened to hit it, then it doesn't work. Some of builds are trying to pull multiple times, so the overall builds fail a lot as it doesn't consitsently work across the whole build |
Hello folks; We have same issue as well since a couple of days. Here is the ERROR message from Jenkins pipeline:
By the way; issue is intermittent; we hit this issue %20 of the time. For your information. |
@leomao10 The connection reset and connection timeouts probably represent different issues. We are currently investigating timeout related issues in east us region and will update once we know more. Also can you share what region you are seeing timeouts in? |
@akhtar-h-m and @devopshangikredi we are actively looking into this. Can you share a bit about which regions you are noticing this degradation in. |
@AndreHamilton-MSFT we are seeing this on our builds that are running on AWS nodes in eu-west-1, and also locally in UK |
@AndreHamilton-MSFT have had any joy yet? It seems a little worse overnight |
@akhtar-h-m Still debugging, The connection reset may be related to some any cast routing and we are still investigating. I think we identified one cause of long delays have been identified and we have identified a mitigation. Will update once its deployed |
@akhtar-h-m is it possible for you to collect a tcp dump and share. Could you include the times of the failures(we get too many request so need timeline to narrow). Is it possible for you to modify your useragent string so we can easily identify your traffic from others? |
@AndreHamilton-MSFT I've asked our infra team for logs. Will get back to you as soon as we can with that. Otherwise I can give you a particular time 10-Oct-2023 13:20:49 (UTC +1) .Our ip address will be either one of 52.50.194.92, 52.215.230.58 |
@akhtar-h-m were these just connection resets or did you also experience timeouts. Can you also give me a succesfful timestamp. Thanks again |
We are having this issue on Azure Australia East as well. |
Similar issue on Azure North Central US while using Azure Container Registry Task build functionality: dial tcp: lookup mcr.microsoft.com: i/o timeout |
Sorry for delays in this. We were working on ways to better isolate this kind of issue. Are folks still experiencing tcp timeouts and if so from where? |
Yeah, I have just faced it with following error (from Iran):
|
Hey, im facing the same issue right now in Brazil.
I executed this docker pull in a WSL in a private network, so probably the IPs are wrong |
There is an issue on pulling an image `mcr.microsoft.com/hello-world` on s390x. It looks a load balancer for the microsoft registry is unstable so that the runner was able to pull the image with 10& success ratio (see microsoft/containerregistry#144) It is not reasonable to let the test run under the unstable environment. This commit skips the tests at image.rs for the platform. Signed-off-by: Hyounggyu Choi <[email protected]>
There is an issue on pulling an image `mcr.microsoft.com/hello-world` on s390x. It looks a load balancer for the microsoft registry is unstable so that the runner was able to pull the image with 10& success ratio (see microsoft/containerregistry#144) It is not reasonable to let the test run under the unstable environment. This commit skips the tests at image.rs for the platform. Signed-off-by: Hyounggyu Choi <[email protected]>
There is an issue on pulling an image `mcr.microsoft.com/hello-world` on s390x. It looks a load balancer for the microsoft registry is unstable so that the runner was able to pull the image with 10& success ratio (see microsoft/containerregistry#144) It is not reasonable to let the test run under the unstable environment. This commit skips the tests at image.rs for the platform. Signed-off-by: Hyounggyu Choi <[email protected]>
There is an issue on pulling an image `mcr.microsoft.com/hello-world` on s390x. It looks a load balancer for the microsoft registry is unstable so that the runner was able to pull the image with 10& success ratio (see microsoft/containerregistry#144) It is not reasonable to let the test run under the unstable environment. This commit skips the tests at image.rs for the platform. Signed-off-by: Hyounggyu Choi <[email protected]>
You should be seeing improvements related to this now |
Facing the same issue now. Created issue #165 |
Hi there,
I am Leo Liang from Bitbucket Pipelines team have had several users report failures trying to pull images from
mcr.microsoft.com
. We done some analysis of our logs and have found a combination of "connection reset by peer" and "i/o timeout" errors talking to both mcr.microsoft.com and eastus.data.mcr.microsoft.com.The majority of errors are for mcr.microsoft.com, and mainly happen in our node in AWS
us-east-1
region, and it constantly happen and we don't see abnormal spike of error rate.Here is one of the tcp dump we capture from one of the failing build:
Base on our understanding:
Do you aware any existing networking issue between AWS and
mcr.microsoft.com
?The text was updated successfully, but these errors were encountered: