-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix unstable test TestReplication_FederationStates() #7588
Fix unstable test TestReplication_FederationStates() #7588
Conversation
ff4b7e6
to
ee2c412
Compare
ee2c412
to
9771dfa
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! LGTM
@@ -121,8 +121,12 @@ func TestReplication_FederationStates(t *testing.T) { | |||
require.NoError(t, s1.RPC("FederationState.Apply", &arg, &out)) | |||
} | |||
|
|||
nTimesWithDelay := func() *retry.Counter { | |||
return &retry.Counter{Count: 30, Wait: 750 * time.Millisecond} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just had a look at the different options for Retryer
. I think in general Timer
would be preferable because it gives you a hard limit on how long you want to wait. Counter
could have the test run much longer if the operation being retried is also quite slow.
For my own understanding, is there a strong reason to use Counter
here instead of Timer
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dnephin The default retry creates errors very easily, on my machine, on master:
while /usr/local/bin/go test -timeout 30s github.com/hashicorp/consul/agent/consul -run '^(TestReplication_FederationStates)$'; do go clean -testcache; done
ok github.com/hashicorp/consul/agent/consul 4.903s
ok github.com/hashicorp/consul/agent/consul 5.115s
ok github.com/hashicorp/consul/agent/consul 5.154s
ok github.com/hashicorp/consul/agent/consul 5.058s
ok github.com/hashicorp/consul/agent/consul 5.149s
[INFO] freeport: blockSize 1500 too big for system limit 256. Adjusting...
[INFO] freeport: detected ephemeral port range of [49152, 65535]
--- FAIL: TestReplication_FederationStates (11.13s)
writer.go:29: 2020-04-08T00:34:52.531+0200 [INFO] TestReplication_FederationStates-node-1.server.raft: initial configuration: index=1 servers="[{Suffrage:Voter ID:55a53765-656e-f83e-68e4-adb4dbfa637c Address:127.0.0.1:12531}]"
=> fails at 6th run with master 74bd138
While I tested it on my machine, it was failing less - but I was mistaken, I retried today, and it is failing as much - even with 60s timeout.
There is probably a real bug, sometimes, states are not properly replicated, this patch does not work, do you want me to create an issue (because the root cause is probably far more complex) ?
This patch does not solve anything |
This test fails quite often in CI, be less strict about convergence duration
Example of failure: https://circleci.com/gh/hashicorp/consul/153469#tests/containers/1