Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: gcb api throttling retry backoff not implemented correctly #6023

Merged
merged 1 commit into from
Jun 17, 2021

Conversation

gsquared94
Copy link
Contributor

@gsquared94 gsquared94 commented Jun 16, 2021

We intended to retry GCB status requests with exponential backoff when the requests start getting throttled. However it seems we never had the backoff working.

cc @dhodun might help with the GCB 429 errors you were experiencing with multiple parallel builds.

@gsquared94 gsquared94 requested a review from a team as a code owner June 16, 2021 09:39
@google-cla google-cla bot added the cla: yes label Jun 16, 2021
@codecov
Copy link

codecov bot commented Jun 16, 2021

Codecov Report

Merging #6023 (10caea0) into master (f76873c) will decrease coverage by 0.00%.
The diff coverage is 0.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #6023      +/-   ##
==========================================
- Coverage   70.74%   70.73%   -0.01%     
==========================================
  Files         462      462              
  Lines       17887    17890       +3     
==========================================
+ Hits        12654    12655       +1     
- Misses       4303     4305       +2     
  Partials      930      930              
Impacted Files Coverage Δ
pkg/skaffold/build/gcb/cloud_build.go 0.00% <0.00%> (ø)
pkg/skaffold/docker/image.go 78.13% <0.00%> (-1.40%) ⬇️
pkg/skaffold/docker/parse.go 86.19% <0.00%> (-0.96%) ⬇️
...skaffold/kubernetes/debugging/container_manager.go 51.85% <0.00%> (+0.87%) ⬆️
pkg/skaffold/util/tar.go 56.00% <0.00%> (+5.33%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f76873c...10caea0. Read the comment docs.

@gsquared94 gsquared94 added the kokoro:force-run forces a kokoro re-run on a PR label Jun 16, 2021
@kokoro-team kokoro-team removed the kokoro:force-run forces a kokoro re-run on a PR label Jun 16, 2021
@@ -151,7 +151,7 @@ watch:
logrus.Debugf("current offset %d", offset)
backoff := NewStatusBackoff()
if waitErr := wait.Poll(backoff.Duration, RetryTimeout, func() (bool, error) {
backoff.Step()
time.Sleep(backoff.Step())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Poll doesn't seem the right function here: it does a fixed wait. And we're not checking the context here either.

wait.ExponentialBackoffWithContext() looks like the function we should be using?

Honestly, this whole loop should be refactored into a set of methods. It's hard to trace the logic, and I worry that there's a chance that we cut off the logs (do we know we retrieve the remainder of the logs when cb.Status == StatusSuccess or StatusFailure?

And shouldn't we delete the source archive on failure or any other termination condition?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And you are refactoring it in #6005 (where is my brain :-D)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ya 😅 I thought to just patch this bug in the interim.

@tejal29 tejal29 merged commit 1e96144 into GoogleContainerTools:master Jun 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants