[WIP] use `list` api to query multiple Cloud Build statuses together #6005

gsquared94 · 2021-06-10T19:09:22Z

Fixes: #5888
Description
TODO: Add description, add tests

codecov · 2021-06-10T19:26:21Z

Codecov Report

❗ No coverage uploaded for pull request base (master@840f9b2). Click here to learn what that means.
The diff coverage is 12.12%.

❗ Current head 830a56c differs from pull request most recent head 34053c5. Consider uploading reports for the commit 34053c5 to get more accurate results

@@            Coverage Diff            @@
##             master    #6005   +/-   ##
=========================================
  Coverage          ?   70.48%           
=========================================
  Files             ?      463           
  Lines             ?    17867           
  Branches          ?        0           
=========================================
  Hits              ?    12594           
  Misses            ?     4340           
  Partials          ?      933

Impacted Files	Coverage Δ
pkg/skaffold/build/gcb/cloud_build.go	`0.00% <0.00%> (ø)`
pkg/skaffold/build/gcb/status.go	`13.75% <13.75%> (ø)`
pkg/skaffold/build/gcb/types.go	`57.14% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 840f9b2...34053c5. Read the comment docs.

pkg/skaffold/build/gcb/status.go

pkg/skaffold/build/gcb/cloud_build.go

briandealwis · 2021-06-11T20:14:28Z

pkg/skaffold/build/gcb/status.go

+						delete(jobs, id)
+					}
+				}
+				statuses, err := getStatuses(projectID, jobs)


Is there a limit to the number of jobs that we can query? We can skip anything excessive since we should trim jobs as they succeed.

we trim the jobs as we succeed, fail or cancel. The query parameter is freeform, I'm not sure if there is a limit. However I don't anticipate us hitting it.

pkg/skaffold/build/gcb/status.go

pkg/skaffold/build/gcb/types.go

pkg/skaffold/build/gcb/status.go

nkubala

this code is a bit tough to review, but based on running through some scenarios in my head and trying it out things look good.

nkubala · 2021-06-21T17:35:44Z

pkg/skaffold/build/gcb/types.go

-	// RetryTimeout is the max amount of time to retry getting the status of the build before erroring
-	RetryTimeout = 3 * time.Minute
+	// MaxRetryCount is the max number of times we retry a throttled GCB API request
+	MaxRetryCount = 20


hard to do the math in my head but does 20 tries roughly equate to the 3 minute timeout we had before?

nkubala · 2021-06-21T17:46:51Z

pkg/skaffold/build/gcb/status.go

+				})
+			case projectID := <-p.trans:
+				go func() {
+					p.C <- projectID


if we already blocked on the projectID := <-p.trans call in the case statement, does this need to be in a go func()?

nkubala · 2021-06-21T17:51:17Z

pkg/skaffold/build/gcb/status.go

+	p.remove <- projectID
+}
+
+func (r *statusManagerImpl) run() {


this function could use maybe a two to three sentence high-level description of what it does. there's a lot going on here :)

something like

run() manages the status of all GCB builds started by Skaffold, by using the `list` API to retrieve all jobs from a build ID. for each project ID encountered, a `poller` is set up to repeatedly poll and report status of all jobs, with a backoff mechanism in place to handle throttling from the API server.... etc.

briandealwis

I'd definitely like to see some tests!

briandealwis · 2021-06-22T14:53:19Z

pkg/skaffold/build/gcb/cloud_build.go

+	var buildComplete bool
+	delay := time.NewTicker(PollDelay)
+	defer delay.Stop()
+	buildResult := b.reporter.getStatus(ctx, projectID, remoteID)


WDYT of s/getStatus/trackStatus/ and s/reporter/monitor/?

getStatus implies that we've wait until we've gotten the result

this reporter is monitoring for build status, and doesn't itself report.

briandealwis · 2021-06-22T15:13:43Z

pkg/skaffold/build/gcb/cloud_build.go

@@ -144,60 +142,45 @@ func (b *Builder) buildArtifactWithCloudBuild(ctx context.Context, out io.Writer

 	var digest string
 	offset := int64(0)
+	var buildComplete bool
+	delay := time.NewTicker(PollDelay)


Suggested change

delay := time.NewTicker(PollDelay)

delay := time.NewTicker(PollDelay) // fixed schedule for log polling

briandealwis · 2021-06-22T15:23:54Z

pkg/skaffold/build/gcb/status.go

+
+func (r *statusManagerImpl) setResult(result result) {
+	r.resultMutex.RLock()
+	r.results[result.jobID] <- result


Should we delete the result from r.results? If so, add a test case to ensure we don't see .results grow without end. If not, add a comment.

briandealwis · 2021-06-22T15:30:03Z

pkg/skaffold/build/gcb/cloud_build.go

+	var buildComplete bool
+	delay := time.NewTicker(PollDelay)
+	defer delay.Stop()
+	buildResult := b.reporter.getStatus(ctx, projectID, remoteID)
 watch:


Loop labels can be hard to reason about, especially when used with both goto and break, and this loop relies on the buildResult channel not being closed (which seems unusual?).

Can we instead track the number of logging bytes last received, and turn the for {} into for !buildComplete || lastLog > 0 {}.

briandealwis · 2021-06-22T15:34:54Z

pkg/skaffold/build/gcb/status.go

+		for {
+			select {
+			case projectID := <-p.remove:
+				if b, found := p.timers[projectID]; found {


b is a non-obvious name. t?

briandealwis · 2021-06-22T15:45:33Z

pkg/skaffold/build/gcb/status.go

+				delete(p.timers, projectID)
+				delete(p.backoffs, projectID)
+			case projectID := <-p.reset:
+				if b, found := p.timers[projectID]; found {


briandealwis · 2021-06-22T15:55:21Z

pkg/skaffold/build/gcb/status.go

+	return reporter
+}
+
+// poller sends the `projectID` on channel `C` with an exponentially increasing period for each project.


WDYT about breaking poller out to a new file. It should have a stop() method.

Though it feels like you could simplify this and avoid the goroutine and internal channels with a sync.Map

(I'm not sure that such a "simplification" would actually reduce the complexity though.)

briandealwis · 2021-06-22T16:24:18Z

pkg/skaffold/build/gcb/status.go

+// jobID represents a single build job
+type jobID struct {
+	projectID string
+	buildID   string
+}


WDYT of creating types for buildID and projectID for use in the signatures and get some type checking into place:

type projectID string type buildID string

I think the poller would benefit.

briandealwis · 2021-06-22T16:27:18Z

pkg/skaffold/build/gcb/status.go

+	// maintain single instance of the GCB client per skaffold process
+	client     *cloudbuild.Service
+	clientOnce sync.Once


Singletons are bad — they're a global dependency and breaks modularity. We should strive to avoid them (look at the kubeContext/kubeConfig mess). At the very least, have this instance be managed by the statusManagerImpl.

You should be able to write tests for the statusManagerImpl and have different backoff values (nobody wants the test to wait a minute to verify timeout, for example).

gsquared94 · 2021-07-27T18:34:41Z

will make updates and add tests and reopen

use list api to query multiple Cloud Build statuses together

830a56c

gsquared94 requested a review from briandealwis June 10, 2021 19:09

pull-request-size bot added the size/L label Jun 10, 2021

google-cla bot added the cla: yes label Jun 10, 2021

gsquared94 commented Jun 11, 2021

View reviewed changes

pkg/skaffold/build/gcb/status.go Show resolved Hide resolved

briandealwis reviewed Jun 11, 2021

View reviewed changes

briandealwis mentioned this pull request Jun 16, 2021

fix: gcb api throttling retry backoff not implemented correctly #6023

Merged

gsquared94 changed the title ~~use list api to query multiple Cloud Build statuses together~~ [WIP] use list api to query multiple Cloud Build statuses together Jun 16, 2021

gsquared94 added 2 commits June 20, 2021 16:37

wip

c2f8970

change polling to exponential per project

ac56593

nkubala reviewed Jun 21, 2021

View reviewed changes

briandealwis reviewed Jun 22, 2021

View reviewed changes

fix retry timeout duration

34053c5

tejal29 assigned briandealwis Jun 23, 2021

briandealwis mentioned this pull request Jul 5, 2021

skaffold render sometimes exits early #6126

Closed

gsquared94 closed this Jul 27, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] use `list` api to query multiple Cloud Build statuses together #6005

[WIP] use `list` api to query multiple Cloud Build statuses together #6005

gsquared94 commented Jun 10, 2021 •

edited

Loading

codecov bot commented Jun 10, 2021 •

edited

Loading

briandealwis Jun 11, 2021

gsquared94 Jun 21, 2021

nkubala left a comment

nkubala Jun 21, 2021

nkubala Jun 21, 2021

nkubala Jun 21, 2021

briandealwis left a comment

briandealwis Jun 22, 2021

briandealwis Jun 22, 2021

briandealwis Jun 22, 2021

briandealwis Jun 22, 2021

briandealwis Jun 22, 2021

briandealwis Jun 22, 2021

briandealwis Jun 22, 2021

briandealwis Jun 22, 2021

briandealwis Jun 22, 2021

briandealwis Jun 22, 2021

gsquared94 commented Jul 27, 2021

	delay := time.NewTicker(PollDelay)
	delay := time.NewTicker(PollDelay) // fixed schedule for log polling

[WIP] use list api to query multiple Cloud Build statuses together #6005

[WIP] use list api to query multiple Cloud Build statuses together #6005

Conversation

gsquared94 commented Jun 10, 2021 • edited Loading

codecov bot commented Jun 10, 2021 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nkubala left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

briandealwis left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gsquared94 commented Jul 27, 2021

[WIP] use `list` api to query multiple Cloud Build statuses together #6005

[WIP] use `list` api to query multiple Cloud Build statuses together #6005

gsquared94 commented Jun 10, 2021 •

edited

Loading

codecov bot commented Jun 10, 2021 •

edited

Loading