-
Notifications
You must be signed in to change notification settings - Fork 380
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gracefully handle HTTP 429 return codes (Too Many Requests) #618
Comments
Hi @miabbott, thanks for opening the issue.
How about "Error determining repository tags: too many requests"? Alternatively, we can do a simple "stop DOSing the registry" ;-) |
@vrothberg I think that message is a good start; I think it might be useful to include the HTTP code, too. Do you think there is anything else that can be done in this scenario besides just error more cleanly? |
I don't know the penalty (i.e., how long the client is rejected) but if that's ~predictable we could try to play smart and retry after a certain amount of time. @mtrmac do you know the (default) penalty? |
The spec says:
So it is unfortunately up to the registry/server if they decide to tell the client how long to back-off. Maybe retry according to |
Sounds great, thanks @miabbott ! |
And probably do some sort of exponential backoff (60s, doubling the wait between attempts, give up after... 3? attempts) if there is no |
We can add a |
This is not raw HTTP; the docker/distribution protocol has an integrated error reporting mechanism: https://github.com/docker/distribution/blob/master/docs/spec/api.md#on-failure-too-many-requests-1 . We should probably call |
It would be a bit weird to retry only on this single code path. I’m not sure whether it is wanted/desired for c/image to automatically retry at all (IIRC the kubelet, and the image builder, do retry at least image pulls already, so compounding the retries might be bad for error detection latency). If we don’t retry, we definitely should return a recognizable error type (which the |
Is there a public registry, and an easy way to cause it to return 429 without affecting other users, available for testing possible fixes? Or an easy way to set up an instance locally? Failing that, a HTTP headers+body dump of the response would probably be sufficient. |
We'll be adding a 'Retry-After' to quay, is there tooling that will honor it? Ideally we'd make use of this immediately with skopeo, will an issue fixed here make it into a skopeo build? |
Yes we update containers/image into skopeo on each release of containers/image. |
Right now, c/image makes an one-shot attempt, there is no retry logic in it, and no special handling of 429/ CRI-O/the kubelet and openshift/builder do ultimately do retries at a higher level, but by that time the So, no, right now there isn’t tooling and there would be no “immediate” benefit. (Adding some kind of support to c/image, whether retrying or just relaying the error in a machine-detectable way, would certainly make it into skopeo in a short while. OTOH the “machine-detectable” aspect of an error object is a bit compromised by a CLI.) |
We are running into this when using skopeo from the command line from a Python program which wants to manipulate tags in Quay.io. Is anyone working on fixing this? Is there anything I can do to help? |
I have some cycles now and can look into it. |
@twaugh, do you know? |
If you send enough requests in a brief burst to Quay.io, you'll get a 429 response -- other users not bursting requests will not. Alternatively, perhaps set up a local nginx proxy with Quay.io as the backend. Something like:
(may need some tweaking) and |
Are you really sure you need to enumerate the tags? Note for example in openshift/pivot#51 we stopped using skopeo entirely because we didn't need it. And if you do need to enumerate the tags, why would you do it in a burst? Rather pass down the data across a pipeline or cache it. |
@twaugh, thanks. Do you have a registry or proxy setting the |
In fact, for our case, no we don't. We filed #698 about that. But our team is not the only one with automation that uses Skopeo. Other teams use the registry our team decides on, and we have an authenticating proxy in front of Quay.io making 429 responses more likely, so switching to using Quay.io will be painful for teams using Skopeo until this issue is addressed. We are separately investigating making the authenticating proxy automatically retry on 429 responses but have not had any luck. In any case, given that Quay.io can give 429 responses, Skopeo ought to handle that.
No. Quay.io's 429 responses do not set |
Please have a look at the proposed fix: #703 It's not looking for a |
There is an official introduction |
The issue was fixed by #703, closing. Thanks to everybody involved for the feedback and for testing! |
Downstream BZ - https://bugzilla.redhat.com/show_bug.cgi?id=1702519
In the above BZ, the RHCOS nodes are attempting to pull a container from quay.io via
skopeo
and the registry is returning HTTP 429 (Too Many Requests).skopeo
reacts with:msg="Error determining repository tags: Invalid status code returned when fetching tags list 429 (Too Many Requests)"
I think the error message is misleading since 429 is a valid code (https://tools.ietf.org/html/rfc6585#section-4), so at a minimum the message should be changed.
cc: @runcom
The text was updated successfully, but these errors were encountered: