Fix flakes with rpc integration test #1860

nkubala · 2019-03-21T23:47:57Z

This adds retries to the integration tests that retrieve the internal state via RPC and HTTP, so that if skaffold doesn't get into the expected state on the first run we can give it a little time to process.

integration/rpc_test.go

codecov-io · 2019-03-23T01:04:13Z

Codecov Report

Merging #1860 into master will increase coverage by 1.29%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #1860      +/-   ##
==========================================
+ Coverage   50.16%   51.45%   +1.29%     
==========================================
  Files         168      171       +3     
  Lines        7356     7690     +334     
==========================================
+ Hits         3690     3957     +267     
- Misses       3310     3357      +47     
- Partials      356      376      +20

Impacted Files	Coverage Δ
pkg/skaffold/deploy/kubectl.go	`68.05% <0%> (-6.19%)`	⬇️
cmd/skaffold/app/cmd/dev.go	`10.09% <0%> (-3.19%)`	⬇️
pkg/skaffold/deploy/helm.go	`61.92% <0%> (ø)`	⬆️
pkg/skaffold/schema/versions.go	`65.85% <0%> (ø)`	⬆️
pkg/skaffold/event/proto/skaffold.pb.go	`5.95% <0%> (ø)`	⬆️
pkg/skaffold/schema/v1beta7/config.go
pkg/skaffold/schema/v1beta7/upgrade.go
pkg/skaffold/debug/transform.go	`86.2% <0%> (ø)`
pkg/skaffold/debug/transform_nodejs.go	`82.75% <0%> (ø)`
pkg/skaffold/debug/transform_jvm.go	`94.84% <0%> (ø)`
... and 7 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9e23d32...358cb19. Read the comment docs.

tejal29 · 2019-03-25T23:46:52Z

integration/rpc_test.go

@@ -61,7 +62,7 @@ func TestEventLogRPC(t *testing.T) {
 		if err != nil {
 			t.Logf("unable to establish skaffold grpc connection: retrying...")
 			attempts++
-			if attempts == retries {
+			if attempts == connectionRetries {


why do we need to re-try here? Is it because, the --rpc-port which the we ended up picking is not free any because some other test or CI job grabbed it?

this would happen if skaffold hadn't started up the RPC server yet. it's pretty unlikely that we would ever actually need to retry here, I had this in the original client I wrote so I figured I would throw it in here for robustness.

The number of connection re-tries is 20 which scares me. If there is a real issue, it would be masked. I am ok with a smaller number of re-tries here like 2.
Another idea: we should log and collect metrics when retries happen. Can help us determine which skaffold code should we stabilize.

I created one here #1880

HAHA yeah that does not need to be 20....I think I meant to set it to 2 originally. I'll change it here, and thanks for opening that issue

balopat

added some comments.

Can you dedupe the 4 (maybe more?) retry loops into a method like Retry(callback func(), waitTime time.duration)?

integration/rpc_test.go

fix flakes with rpc integration test

36cee04

nkubala requested review from balopat, dgageot, priyawadhwa and r2d4 as code owners March 21, 2019 23:47

googlebot added the cla: yes label Mar 21, 2019

balopat suggested changes Mar 22, 2019

View reviewed changes

integration/rpc_test.go Outdated Show resolved Hide resolved

refactor for clarity

cf94831

tejal29 reviewed Mar 25, 2019

View reviewed changes

lint

8df1fb7

nkubala force-pushed the flake branch from c02a746 to 8df1fb7 Compare March 26, 2019 20:37

nkubala mentioned this pull request Mar 26, 2019

locally on Mac the TestGetStateHTTP and TestGetStateRPC are failing #1849

Closed

reduce connection retries from 20 to 2

30bdfce

balopat suggested changes Mar 26, 2019

View reviewed changes

integration/rpc_test.go Outdated Show resolved Hide resolved

integration/rpc_test.go Outdated Show resolved Hide resolved

integration/rpc_test.go Show resolved Hide resolved

fix retry loops, simplify assertions for checking state

358cb19

balopat approved these changes Apr 1, 2019

View reviewed changes

nkubala merged commit 01d8d7e into GoogleContainerTools:master Apr 1, 2019

nkubala deleted the flake branch April 1, 2019 18:35

tejal29 mentioned this pull request Apr 12, 2019

Release v0.27.0 #1951

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix flakes with rpc integration test #1860

Fix flakes with rpc integration test #1860

nkubala commented Mar 21, 2019

codecov-io commented Mar 23, 2019 •

edited

Loading

tejal29 Mar 25, 2019

nkubala Mar 26, 2019

tejal29 Mar 26, 2019

tejal29 Mar 26, 2019

nkubala Mar 26, 2019

balopat left a comment

Fix flakes with rpc integration test #1860

Fix flakes with rpc integration test #1860

Conversation

nkubala commented Mar 21, 2019

codecov-io commented Mar 23, 2019 • edited Loading

Codecov Report

tejal29 Mar 25, 2019

Choose a reason for hiding this comment

nkubala Mar 26, 2019

Choose a reason for hiding this comment

tejal29 Mar 26, 2019

Choose a reason for hiding this comment

tejal29 Mar 26, 2019

Choose a reason for hiding this comment

nkubala Mar 26, 2019

Choose a reason for hiding this comment

balopat left a comment

Choose a reason for hiding this comment

codecov-io commented Mar 23, 2019 •

edited

Loading