docker pull failed with `connection reset by peer` or `i/o timeout` #144

leomao10 · 2023-07-10T00:40:30Z

Hi there,

I am Leo Liang from Bitbucket Pipelines team have had several users report failures trying to pull images from mcr.microsoft.com. We done some analysis of our logs and have found a combination of "connection reset by peer" and "i/o timeout" errors talking to both mcr.microsoft.com and eastus.data.mcr.microsoft.com.

The majority of errors are for mcr.microsoft.com, and mainly happen in our node in AWS us-east-1 region, and it constantly happen and we don't see abnormal spike of error rate.

Here is one of the tcp dump we capture from one of the failing build:

17:57:25.224404 IP 0cdf910b-4c56-4622-8622-3fccc7cbf3c5-nx7hm.57338 > 204.79.197.219.443: Flags [S], seq 2044995150, win 64240, options [mss 1460,sackOK,TS val 4164622281 ecr 0,nop,wscale 7], length 0
17:57:25.226511 IP 204.79.197.219.443 > 0cdf910b-4c56-4622-8622-3fccc7cbf3c5-nx7hm.57338: Flags [S.], seq 496447486, ack 2044995151, win 65535, options [mss 1440,nop,wscale 8,nop,nop,sackOK], length 0
17:57:25.226533 IP 0cdf910b-4c56-4622-8622-3fccc7cbf3c5-nx7hm.57338 > 204.79.197.219.443: Flags [.], ack 1, win 502, length 0
17:57:25.226802 IP 0cdf910b-4c56-4622-8622-3fccc7cbf3c5-nx7hm.57338 > 204.79.197.219.443: Flags [P.], seq 1:250, ack 1, win 502, length 249
17:57:25.228257 IP 204.79.197.219.443 > 0cdf910b-4c56-4622-8622-3fccc7cbf3c5-nx7hm.57338: Flags [.], ack 250, win 16384, length 0
17:57:25.229435 IP 204.79.197.219.443 > 0cdf910b-4c56-4622-8622-3fccc7cbf3c5-nx7hm.57338: Flags [P.], seq 1:5929, ack 250, win 16384, length 5928
17:57:25.229455 IP 0cdf910b-4c56-4622-8622-3fccc7cbf3c5-nx7hm.57338 > 204.79.197.219.443: Flags [.], ack 5929, win 456, length 0
17:57:25.233892 IP 0cdf910b-4c56-4622-8622-3fccc7cbf3c5-nx7hm.57338 > 204.79.197.219.443: Flags [P.], seq 250:408, ack 5929, win 501, length 158
17:57:25.234814 IP 204.79.197.219.443 > 0cdf910b-4c56-4622-8622-3fccc7cbf3c5-nx7hm.57338: Flags [R], seq 496453415, win 0, length 0

Base on our understanding:

It started ok when Pipelines started the handshake and Microsoft acknowledged it as well
On the initial push of data from Pipelines, it was also acknowledged by Microsoft (see sequence 1:250)
However, after the second push of data from Pipelines (see sequence 250:408), it was not acknowledged by Microsoft but immediately terminated instead with the [R] flag which means the connection is reset/terminated abruptly. (see sequence 496453415)

Do you aware any existing networking issue between AWS and mcr.microsoft.com?

The text was updated successfully, but these errors were encountered:

leomao10 · 2023-07-10T00:57:25Z

One thing we notice is all the failed events for last 7 days are related to ip 204.79.197.219:443

And we noticed there are similar networking issue mentioned this ip address, not sure if they are related :
#139
dotnet/core#8268

sjg99 · 2023-07-11T16:02:05Z

Having the same issue with a kubernetes cluster hosted in AWS(us-east-1 also) at the moment of building the container.

Retrieving image mcr.microsoft.com/dotnet/sdk:6.0 from registry mcr.microsoft.com error building image: Get "https://mcr.microsoft.com/v2/dotnet/sdk/blobs/sha256:7d987f8db5482ed7d3fe8669b1cb791fc613d25e04a6cc31eed37677a6091a29": read tcp 10.0.2.246:55774->204.79.197.219:443: read: connection reset by peer

During last week was more ocasional that a pipeline would fail, but the last two days in my case has become almost permanent

akhtar-h-m · 2023-09-27T10:24:30Z

We are seeing similar issues on the last few days. Seems to fail maybe 25% of the time, so could be so load balanced node behind the ip that isat issue and if you just happened to hit it, then it doesn't work. Some of builds are trying to pull multiple times, so the overall builds fail a lot as it doesn't consitsently work across the whole build

devopshangikredi · 2023-10-05T11:58:24Z

Hello folks;

We have same issue as well since a couple of days. Here is the ERROR message from Jenkins pipeline:

   1 | >>> FROM mcr.microsoft.com/dotnet/aspnet:7.0-bullseye-slim AS base
   2 |     WORKDIR /app
   3 |     EXPOSE 80
--------------------
ERROR: failed to solve: mcr.microsoft.com/dotnet/aspnet:7.0-bullseye-slim: pulling from host mcr.microsoft.com failed with status code [manifests 7.0-bullseye-slim]: 503 Service Unavailable

By the way; issue is intermittent; we hit this issue %20 of the time.

For your information.

AndreHamilton-MSFT · 2023-10-07T02:34:48Z

@leomao10 The connection reset and connection timeouts probably represent different issues. We are currently investigating timeout related issues in east us region and will update once we know more. Also can you share what region you are seeing timeouts in?

AndreHamilton-MSFT · 2023-10-07T02:36:06Z

@akhtar-h-m and @devopshangikredi we are actively looking into this. Can you share a bit about which regions you are noticing this degradation in.

akhtar-h-m · 2023-10-07T06:09:22Z

@AndreHamilton-MSFT we are seeing this on our builds that are running on AWS nodes in eu-west-1, and also locally in UK

akhtar-h-m · 2023-10-10T08:07:31Z

@AndreHamilton-MSFT have had any joy yet? It seems a little worse overnight

AndreHamilton-MSFT · 2023-10-10T21:41:22Z

@akhtar-h-m Still debugging, The connection reset may be related to some any cast routing and we are still investigating. I think we identified one cause of long delays have been identified and we have identified a mitigation. Will update once its deployed

AndreHamilton-MSFT · 2023-10-11T21:59:43Z

@akhtar-h-m is it possible for you to collect a tcp dump and share. Could you include the times of the failures(we get too many request so need timeline to narrow). Is it possible for you to modify your useragent string so we can easily identify your traffic from others?

akhtar-h-m · 2023-10-13T10:20:21Z

@AndreHamilton-MSFT I've asked our infra team for logs. Will get back to you as soon as we can with that. Otherwise I can give you a particular time 10-Oct-2023 13:20:49 (UTC +1) .Our ip address will be either one of 52.50.194.92, 52.215.230.58

AndreHamilton-MSFT · 2023-10-13T20:30:11Z

@akhtar-h-m were these just connection resets or did you also experience timeouts. Can you also give me a succesfful timestamp. Thanks again

akirayamamoto · 2024-01-11T04:49:38Z

We are having this issue on Azure Australia East as well.
Get "https://mcr.microsoft.com/v2/": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
2023-01-11 00:30:33 UTC
@AndreHamilton-MSFT

jmcdade11 · 2024-01-16T21:54:44Z

Similar issue on Azure North Central US while using Azure Container Registry Task build functionality:

dial tcp: lookup mcr.microsoft.com: i/o timeout
2024/01/16 21:04:47

AndreHamilton-MSFT · 2024-04-30T20:28:41Z

Sorry for delays in this. We were working on ways to better isolate this kind of issue. Are folks still experiencing tcp timeouts and if so from where?

asgmojtaba · 2024-05-23T10:03:43Z

Sorry for delays in this. We were working on ways to better isolate this kind of issue. Are folks still experiencing tcp timeouts and if so from where?

Yeah, I have just faced it with following error (from Iran):

docker pull mcr.microsoft.com/mssql/server:2022-latest

Error response from daemon: Head "https://mcr.microsoft.com/v2/mssql/server/manifests/2022-latest": read tcp *.*.*.39:39838->204.79.197.219:443: read: connection reset by peer

WellyngtonF · 2024-06-05T19:18:22Z

Hey, im facing the same issue right now in Brazil.

docker pull mcr.microsoft.com/dotnet/sdk:8.0
 
Error response from daemon: Get "https://mcr.microsoft.com/v2/": dial tcp: lookup mcr.microsoft.com on 172.30.160.1:53: read udp 172.30.170.107:46691->172.30.160.1:53: i/o timeout

I executed this docker pull in a WSL in a private network, so probably the IPs are wrong

There is an issue on pulling an image `mcr.microsoft.com/hello-world` on s390x. It looks a load balancer for the microsoft registry is unstable so that the runner was able to pull the image with 10& success ratio (see microsoft/containerregistry#144) It is not reasonable to let the test run under the unstable environment. This commit skips the tests at image.rs for the platform. Signed-off-by: Hyounggyu Choi <[email protected]>

AndreHamilton-MSFT · 2024-07-02T00:35:35Z

You should be seeing improvements related to this now

lucasassisrosa · 2024-07-23T18:24:23Z

Facing the same issue now. Created issue #165

BbolroC mentioned this issue Jun 14, 2024

GHA: Fix indentation error and skip image.rs tests for s390x confidential-containers/guest-components#586

Merged

wing328 mentioned this issue Oct 9, 2024

Default Dockerfile for ASP.Net Core Server Fails - base image not found OpenAPITools/openapi-generator#19820

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docker pull failed with `connection reset by peer` or `i/o timeout` #144

docker pull failed with `connection reset by peer` or `i/o timeout` #144

leomao10 commented Jul 10, 2023 •

edited

Loading

leomao10 commented Jul 10, 2023

sjg99 commented Jul 11, 2023 •

edited

Loading

akhtar-h-m commented Sep 27, 2023

devopshangikredi commented Oct 5, 2023

AndreHamilton-MSFT commented Oct 7, 2023

AndreHamilton-MSFT commented Oct 7, 2023

akhtar-h-m commented Oct 7, 2023

akhtar-h-m commented Oct 10, 2023

AndreHamilton-MSFT commented Oct 10, 2023 •

edited

Loading

AndreHamilton-MSFT commented Oct 11, 2023

akhtar-h-m commented Oct 13, 2023

AndreHamilton-MSFT commented Oct 13, 2023

akirayamamoto commented Jan 11, 2024

jmcdade11 commented Jan 16, 2024

AndreHamilton-MSFT commented Apr 30, 2024

asgmojtaba commented May 23, 2024

WellyngtonF commented Jun 5, 2024 •

edited

Loading

AndreHamilton-MSFT commented Jul 2, 2024

lucasassisrosa commented Jul 23, 2024

docker pull failed with connection reset by peer or i/o timeout #144

docker pull failed with connection reset by peer or i/o timeout #144

Comments

leomao10 commented Jul 10, 2023 • edited Loading

leomao10 commented Jul 10, 2023

sjg99 commented Jul 11, 2023 • edited Loading

akhtar-h-m commented Sep 27, 2023

devopshangikredi commented Oct 5, 2023

AndreHamilton-MSFT commented Oct 7, 2023

AndreHamilton-MSFT commented Oct 7, 2023

akhtar-h-m commented Oct 7, 2023

akhtar-h-m commented Oct 10, 2023

AndreHamilton-MSFT commented Oct 10, 2023 • edited Loading

AndreHamilton-MSFT commented Oct 11, 2023

akhtar-h-m commented Oct 13, 2023

AndreHamilton-MSFT commented Oct 13, 2023

akirayamamoto commented Jan 11, 2024

jmcdade11 commented Jan 16, 2024

AndreHamilton-MSFT commented Apr 30, 2024

asgmojtaba commented May 23, 2024

WellyngtonF commented Jun 5, 2024 • edited Loading

AndreHamilton-MSFT commented Jul 2, 2024

lucasassisrosa commented Jul 23, 2024

docker pull failed with `connection reset by peer` or `i/o timeout` #144

docker pull failed with `connection reset by peer` or `i/o timeout` #144

leomao10 commented Jul 10, 2023 •

edited

Loading

sjg99 commented Jul 11, 2023 •

edited

Loading

AndreHamilton-MSFT commented Oct 10, 2023 •

edited

Loading

WellyngtonF commented Jun 5, 2024 •

edited

Loading