-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: gcb api throttling retry backoff not implemented correctly #6023
fix: gcb api throttling retry backoff not implemented correctly #6023
Conversation
Codecov Report
@@ Coverage Diff @@
## master #6023 +/- ##
==========================================
- Coverage 70.74% 70.73% -0.01%
==========================================
Files 462 462
Lines 17887 17890 +3
==========================================
+ Hits 12654 12655 +1
- Misses 4303 4305 +2
Partials 930 930
Continue to review full report at Codecov.
|
@@ -151,7 +151,7 @@ watch: | |||
logrus.Debugf("current offset %d", offset) | |||
backoff := NewStatusBackoff() | |||
if waitErr := wait.Poll(backoff.Duration, RetryTimeout, func() (bool, error) { | |||
backoff.Step() | |||
time.Sleep(backoff.Step()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Poll doesn't seem the right function here: it does a fixed wait. And we're not checking the context here either.
wait.ExponentialBackoffWithContext()
looks like the function we should be using?
Honestly, this whole loop should be refactored into a set of methods. It's hard to trace the logic, and I worry that there's a chance that we cut off the logs (do we know we retrieve the remainder of the logs when cb.Status == StatusSuccess
or StatusFailure
?
And shouldn't we delete the source archive on failure or any other termination condition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And you are refactoring it in #6005 (where is my brain :-D)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ya 😅 I thought to just patch this bug in the interim.
We intended to retry GCB status requests with exponential backoff when the requests start getting throttled. However it seems we never had the backoff working.
cc @dhodun might help with the GCB 429 errors you were experiencing with multiple parallel builds.