-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix flakes with rpc integration test #1860
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1860 +/- ##
==========================================
+ Coverage 50.16% 51.45% +1.29%
==========================================
Files 168 171 +3
Lines 7356 7690 +334
==========================================
+ Hits 3690 3957 +267
- Misses 3310 3357 +47
- Partials 356 376 +20
Continue to review full report at Codecov.
|
integration/rpc_test.go
Outdated
@@ -61,7 +62,7 @@ func TestEventLogRPC(t *testing.T) { | |||
if err != nil { | |||
t.Logf("unable to establish skaffold grpc connection: retrying...") | |||
attempts++ | |||
if attempts == retries { | |||
if attempts == connectionRetries { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need to re-try here? Is it because, the --rpc-port
which the we ended up picking is not free any because some other test or CI job grabbed it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this would happen if skaffold hadn't started up the RPC server yet. it's pretty unlikely that we would ever actually need to retry here, I had this in the original client I wrote so I figured I would throw it in here for robustness.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The number of connection re-tries is 20 which scares me. If there is a real issue, it would be masked. I am ok with a smaller number of re-tries here like 2.
Another idea: we should log and collect metrics when retries happen. Can help us determine which skaffold code should we stabilize.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created one here #1880
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HAHA yeah that does not need to be 20....I think I meant to set it to 2 originally. I'll change it here, and thanks for opening that issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added some comments.
Can you dedupe the 4 (maybe more?) retry loops into a method like Retry(callback func(), waitTime time.duration)
?
This adds retries to the integration tests that retrieve the internal state via RPC and HTTP, so that if skaffold doesn't get into the expected state on the first run we can give it a little time to process.