Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docker: handle http 429 status codes #703

Merged
merged 1 commit into from
Oct 18, 2019

Conversation

vrothberg
Copy link
Member

@vrothberg vrothberg commented Sep 17, 2019

Consolidate checking the http-status codes to allow for a more uniform
error handling. Also treat code 429 (too many requests) as a known
error instead of an invalid status code.

When hitting 429, perform an exponential back off starting a 2 seconds
for at most 5 iterations.

Signed-off-by: Valentin Rothberg [email protected]

@vrothberg
Copy link
Member Author

Fixes: #618

docker/docker_client.go Outdated Show resolved Hide resolved
Copy link
Member Author

@vrothberg vrothberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the reviews, @MartinBasti and @twaugh.

As we're already on it: we're not (yet) looking for the Retry-After but can easily add support for it. The header can be both, an http date or a numerical value for the delay in seconds. Both can be exploited by a malicious server, so we had to set a maximum. Hence the questions:

  • What should the maximum delay in seconds be?
  • Should we attempt exponential back offs for the values from the header as well?

docker/docker_client.go Outdated Show resolved Hide resolved
@twaugh
Copy link
Contributor

twaugh commented Sep 17, 2019

* What should the maximum delay in seconds be?

Not sure. 120? 60?

* Should we attempt exponential back offs for the values from the header as well?

I don't mind. I don't think it's mandatory, but it's possible it will give better behaviour.

@vrothberg vrothberg force-pushed the status-checks branch 2 times, most recently from 994ca9c to abb6f83 Compare September 17, 2019 15:19
Copy link
Member

@rhatdan rhatdan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

// iterations.
delay = 2
for i := 0; i < 5; i++ {
res, err = c.makeRequestToResolvedURLHelper(ctx, method, url, headers, stream, streamLen, auth, extraScope)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stream can be only consumed once; this fundamentally can’t be done, at least not at this level of abstraction.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Looking at the call sites, we should not perform any retry when stream != nil which avoids the issue.

@@ -627,9 +636,11 @@ func (c *dockerClient) getExtensionsSignatures(ctx context.Context, ref dockerRe
return nil, err
}
defer res.Body.Close()
if res.StatusCode != http.StatusOK {
return nil, errors.Wrapf(client.HandleErrorResponse(res), "Error downloading signatures for %s in %s", manifestDigest, ref.ref.Name())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dropping client.HandleErrorResponse(res) seems rather undesirable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you elaborate why? I was found the output of HandleErrorResponse not very user friendly and hard to brain parse. Is there some behavior or output you'd like to keep?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case, I'd switch all checks to that to have ~consistent error messaging.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • It’s the only way to get actual server-produced error messages; dumping the raw HTTP response body would be much worse. Sure, they are often useless, but at least they can be improved.
  • The failures are machine-identifiable and converted to specific Go types. See how many different failures in See vendor/github.com/docker/distribution/registry/api/v2/errors.go just map to http.StatusBadRequest, and isManifestInvalidError.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case, I'd switch all checks to that to have ~consistent error messaging.

Sure, as long as the API documents that this format of error reporting is used. That’s not the case for look-aside signatures, at the very least; I haven’t checked the other cases.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case, I'd switch all checks to that to have ~consistent error messaging.

That won't work for all responses:

error parsing HTTP 429 response body: invalid character '<' looking for beginning of value: "<html>\r\n<head><title>429 Too Many Requests</title></head>\r\n<body>\r\n<center><h1>429 Too Many Requests</h1></center>\r\n<hr><center>nginx/1.17.3</center>\r\n</body>\r\n</html>\r\n"

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quay.io is known to be non-compliant, what else is new?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's news for me at least. I'll try with nginx pointed to another registry...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m sorry, this was too flippant and not helpful. If we are centralizing error handling, adding a centralized workaround for Quay (e.g. “body starts with <html>”) would of course be an option. (Sure, technically it was an option before as well, but sprinkling that workaround all over the package would be too ugly.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left this unchanged. Improving and further centralizing error handling sounds like a good idea but is not the intention of this PR. The new function was added to avoid too much boilerplate.

return res, err
}

func (c *dockerClient) makeRequestToResolvedURLHelper(ctx context.Context, method, url string, headers map[string][]string, stream io.Reader, streamLen int64, auth sendAuth, extraScope *authScope) (*http.Response, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Helper says nothing, especially without any documentation about its purpose. …WithoutRetries, maybe?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment and renamed it to makeRequestToResolvedURLOnce.

@vrothberg
Copy link
Member Author

Updated. @mtrmac, mind having another look?

I'll tackle the Retry-After handling now.

@vrothberg vrothberg force-pushed the status-checks branch 2 times, most recently from 26d1a7d to c9d6299 Compare September 19, 2019 08:51
@vrothberg
Copy link
Member Author

We're now handling the Retry-After header for both, numerical values and dates.
@twaugh @MartinBasti @mtrmac PTAL.

@twaugh
Copy link
Contributor

twaugh commented Sep 19, 2019

LGTM. Thanks!

@vrothberg
Copy link
Member Author

Rebased on latest master.

@vrothberg vrothberg force-pushed the status-checks branch 2 times, most recently from 90c6f20 to c58f11c Compare October 17, 2019 09:38
@vrothberg
Copy link
Member Author

CI hiccups from Skopeo have been fixed, and we're green again.

@nalind @rhatdan @mrunalp @baude @giuseppe PTAL

@rhatdan
Copy link
Member

rhatdan commented Oct 17, 2019

LGTM

// ErrUnauthorizedForCredentials is returned when the status code returned is 401
ErrUnauthorizedForCredentials = errors.New("unable to retrieve auth token: invalid username/password")
// ErrTooManyRequests is returned when the status code returned is 429
ErrTooManyRequests = errors.New("too many request to registry")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

github.com/pkg/errors.New() records a stack trace that leads back here. I'm guessing that the standard library's errors.New() would work better, perhaps using github.com/pkg/errors.WithStack() to wrap them before returning them in httpResponseToError().

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like a nice improvement!

docker/docker_client.go Outdated Show resolved Hide resolved
Consolidate checking the http-status codes to allow for a more uniform
error handling.  Also treat code 429 (too many requests) as a known
error instead of an invalid status code.

When hitting 429, perform an exponential back off starting a 2 seconds
for at most 5 iterations.  If the http.Response set the `Retry-Header`
then use the provided value or date to compute the delay until the
next attempt.  Note that the maximum delay is 60 seconds.

Signed-off-by: Valentin Rothberg <[email protected]>
@vrothberg
Copy link
Member Author

Updated

@nalind
Copy link
Member

nalind commented Oct 18, 2019

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants