gracefully handle HTTP 429 return codes (Too Many Requests) #618

miabbott · 2019-04-24T13:36:15Z

Downstream BZ - https://bugzilla.redhat.com/show_bug.cgi?id=1702519

In the above BZ, the RHCOS nodes are attempting to pull a container from quay.io via skopeo and the registry is returning HTTP 429 (Too Many Requests). skopeo reacts with:

msg="Error determining repository tags: Invalid status code returned when fetching tags list 429 (Too Many Requests)"

I think the error message is misleading since 429 is a valid code (https://tools.ietf.org/html/rfc6585#section-4), so at a minimum the message should be changed.

cc: @runcom

The text was updated successfully, but these errors were encountered:

vrothberg · 2019-04-24T13:41:39Z

Hi @miabbott, thanks for opening the issue.

[...], so at a minimum the message should be changed.

How about "Error determining repository tags: too many requests"? Alternatively, we can do a simple "stop DOSing the registry" ;-)

miabbott · 2019-04-24T13:46:40Z

@vrothberg I think that message is a good start; I think it might be useful to include the HTTP code, too.

Do you think there is anything else that can be done in this scenario besides just error more cleanly?

vrothberg · 2019-04-24T13:53:28Z

Do you think there is anything else that can be done in this scenario besides just error more cleanly?

I don't know the penalty (i.e., how long the client is rejected) but if that's ~predictable we could try to play smart and retry after a certain amount of time. @mtrmac do you know the (default) penalty?

miabbott · 2019-04-24T13:56:53Z

The spec says:

...MAY include a Retry-After header indicating how long to wait before making a new request.

So it is unfortunately up to the registry/server if they decide to tell the client how long to back-off. Maybe retry according to Retry-After if it is present in the response?

vrothberg · 2019-04-24T13:59:43Z

Sounds great, thanks @miabbott !

wking · 2019-04-24T14:36:27Z

And probably do some sort of exponential backoff (60s, doubling the wait between attempts, give up after... 3? attempts) if there is no Retry-After.

josephschorr · 2019-04-24T14:55:38Z

We can add a Retry-After header too, if that helps

mtrmac · 2019-05-07T22:16:07Z

[...], so at a minimum the message should be changed.

How about "Error determining repository tags: too many requests"? Alternatively, we can do a simple "stop DOSing the registry" ;-)

This is not raw HTTP; the docker/distribution protocol has an integrated error reporting mechanism: https://github.com/docker/distribution/blob/master/docs/spec/api.md#on-failure-too-many-requests-1 .

We should probably call github.com/docker/distribution/registry/client.HandleErrorResponse or something similar in GetRepositoryTags, instead of just blindly checking for 200.

mtrmac · 2019-05-07T22:22:31Z

Do you think there is anything else that can be done in this scenario besides just error more cleanly?

I don't know the penalty (i.e., how long the client is rejected) but if that's ~predictable we could try to play smart and retry after a certain amount of time. @mtrmac do you know the (default) penalty?

It would be a bit weird to retry only on this single code path. I’m not sure whether it is wanted/desired for c/image to automatically retry at all (IIRC the kubelet, and the image builder, do retry at least image pulls already, so compounding the retries might be bad for error detection latency).

If we don’t retry, we definitely should return a recognizable error type (which the HandleErrorResponse call would do); should we also make the Retry-After header value available in the error? The conversation above suggests that it is not implemented in the relevant registry, and the spec does not describe it either (though, of course, using the HTTP-specified header would not be wrong :) ).

mtrmac · 2019-05-07T22:23:34Z

Is there a public registry, and an easy way to cause it to return 429 without affecting other users, available for testing possible fixes? Or an easy way to set up an instance locally?

Failing that, a HTTP headers+body dump of the response would probably be sufficient.

thomasmckay · 2019-08-07T18:51:50Z

We'll be adding a 'Retry-After' to quay, is there tooling that will honor it? Ideally we'd make use of this immediately with skopeo, will an issue fixed here make it into a skopeo build?

rhatdan · 2019-08-07T21:15:49Z

Yes we update containers/image into skopeo on each release of containers/image.

mtrmac · 2019-08-07T22:51:00Z

Right now, c/image makes an one-shot attempt, there is no retry logic in it, and no special handling of 429/Retry-After.

CRI-O/the kubelet and openshift/builder do ultimately do retries at a higher level, but by that time the Retry-After information is lost.

So, no, right now there isn’t tooling and there would be no “immediate” benefit.

(Adding some kind of support to c/image, whether retrying or just relaying the error in a machine-detectable way, would certainly make it into skopeo in a short while. OTOH the “machine-detectable” aspect of an error object is a bit compromised by a CLI.)

twaugh · 2019-09-10T10:58:21Z

We are running into this when using skopeo from the command line from a Python program which wants to manipulate tags in Quay.io.

Is anyone working on fixing this? Is there anything I can do to help?

vrothberg · 2019-09-10T14:58:17Z

I have some cycles now and can look into it.

vrothberg · 2019-09-12T09:14:09Z

Is there a public registry, and an easy way to cause it to return 429 without affecting other users, available for testing possible fixes? Or an easy way to set up an instance locally?

@twaugh, do you know?

twaugh · 2019-09-13T09:04:01Z

If you send enough requests in a brief burst to Quay.io, you'll get a 429 response -- other users not bursting requests will not.

Alternatively, perhaps set up a local nginx proxy with Quay.io as the backend. Something like:

events {}

http {
    limit_req_zone $binary_remote_addr zone=mylimit:1m rate=1r/s;
 
    server {
        listen 443;
        location / {
            limit_req zone=mylimit;
            limit_req_status 429;
            limit_conn_status 429;
            proxy_pass https://quay.io;
        }
    }
}

(may need some tweaking)

and podman run --name nginx -v .../nginx.conf:/etc/nginx/nginx.conf:Z -p 8443:443 nginx

cgwalters · 2019-09-17T10:01:12Z

If you send enough requests in a brief burst to Quay.io

Are you really sure you need to enumerate the tags? Note for example in openshift/pivot#51 we stopped using skopeo entirely because we didn't need it.

And if you do need to enumerate the tags, why would you do it in a burst? Rather pass down the data across a pipeline or cache it.

vrothberg · 2019-09-17T10:20:45Z

@twaugh, thanks. Do you have a registry or proxy setting the Retry-After header? I forced a local nginx returning it for testing purposes but would love to throw me at a production endpoint.

twaugh · 2019-09-17T10:31:20Z

Are you really sure you need to enumerate the tags?

In fact, for our case, no we don't. We filed #698 about that.

But our team is not the only one with automation that uses Skopeo. Other teams use the registry our team decides on, and we have an authenticating proxy in front of Quay.io making 429 responses more likely, so switching to using Quay.io will be painful for teams using Skopeo until this issue is addressed.

We are separately investigating making the authenticating proxy automatically retry on 429 responses but have not had any luck. In any case, given that Quay.io can give 429 responses, Skopeo ought to handle that.

@twaugh, thanks. Do you have a registry or proxy setting the Retry-After header? I forced a local nginx returning it for testing purposes but would love to throw me at a production endpoint.

No. Quay.io's 429 responses do not set Retry-After.

vrothberg · 2019-09-17T13:49:08Z

Please have a look at the proposed fix: #703

It's not looking for a Retry-After header and performs a simple exponential backoff starting at 20 secs.

cuishuang · 2019-11-13T05:40:02Z

There is an official introduction
https://docs.quay.io/issues/429.html

vrothberg · 2019-11-13T13:22:18Z

The issue was fixed by #703, closing. Thanks to everybody involved for the feedback and for testing!

vrothberg self-assigned this Sep 17, 2019

vrothberg mentioned this issue Sep 17, 2019

docker: handle http 429 status codes #703

Merged

vrothberg closed this as completed Nov 13, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gracefully handle HTTP 429 return codes (Too Many Requests) #618

gracefully handle HTTP 429 return codes (Too Many Requests) #618

miabbott commented Apr 24, 2019

vrothberg commented Apr 24, 2019

miabbott commented Apr 24, 2019

vrothberg commented Apr 24, 2019

miabbott commented Apr 24, 2019 •

edited

Loading

vrothberg commented Apr 24, 2019

wking commented Apr 24, 2019 •

edited

Loading

josephschorr commented Apr 24, 2019

mtrmac commented May 7, 2019

mtrmac commented May 7, 2019

mtrmac commented May 7, 2019

thomasmckay commented Aug 7, 2019

rhatdan commented Aug 7, 2019

mtrmac commented Aug 7, 2019

twaugh commented Sep 10, 2019

vrothberg commented Sep 10, 2019

vrothberg commented Sep 12, 2019

twaugh commented Sep 13, 2019

cgwalters commented Sep 17, 2019

vrothberg commented Sep 17, 2019

twaugh commented Sep 17, 2019

vrothberg commented Sep 17, 2019

cuishuang commented Nov 13, 2019

vrothberg commented Nov 13, 2019

gracefully handle HTTP 429 return codes (Too Many Requests) #618

gracefully handle HTTP 429 return codes (Too Many Requests) #618

Comments

miabbott commented Apr 24, 2019

vrothberg commented Apr 24, 2019

miabbott commented Apr 24, 2019

vrothberg commented Apr 24, 2019

miabbott commented Apr 24, 2019 • edited Loading

vrothberg commented Apr 24, 2019

wking commented Apr 24, 2019 • edited Loading

josephschorr commented Apr 24, 2019

mtrmac commented May 7, 2019

mtrmac commented May 7, 2019

mtrmac commented May 7, 2019

thomasmckay commented Aug 7, 2019

rhatdan commented Aug 7, 2019

mtrmac commented Aug 7, 2019

twaugh commented Sep 10, 2019

vrothberg commented Sep 10, 2019

vrothberg commented Sep 12, 2019

twaugh commented Sep 13, 2019

cgwalters commented Sep 17, 2019

vrothberg commented Sep 17, 2019

twaugh commented Sep 17, 2019

vrothberg commented Sep 17, 2019

cuishuang commented Nov 13, 2019

vrothberg commented Nov 13, 2019

miabbott commented Apr 24, 2019 •

edited

Loading

wking commented Apr 24, 2019 •

edited

Loading