Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix unstable test TestReplication_FederationStates() #7588

Conversation

pierresouchay
Copy link
Contributor

@pierresouchay pierresouchay commented Apr 3, 2020

This test fails quite often in CI, be less strict about convergence duration

Example of failure: https://circleci.com/gh/hashicorp/consul/153469#tests/containers/1

@pierresouchay pierresouchay force-pushed the fix_unstable_test_TestReplication_FederationStates branch 3 times, most recently from ff4b7e6 to ee2c412 Compare April 3, 2020 12:41
@pierresouchay pierresouchay force-pushed the fix_unstable_test_TestReplication_FederationStates branch from ee2c412 to 9771dfa Compare April 3, 2020 13:01
Copy link
Contributor

@dnephin dnephin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! LGTM

@@ -121,8 +121,12 @@ func TestReplication_FederationStates(t *testing.T) {
require.NoError(t, s1.RPC("FederationState.Apply", &arg, &out))
}

nTimesWithDelay := func() *retry.Counter {
return &retry.Counter{Count: 30, Wait: 750 * time.Millisecond}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just had a look at the different options for Retryer. I think in general Timer would be preferable because it gives you a hard limit on how long you want to wait. Counter could have the test run much longer if the operation being retried is also quite slow.

For my own understanding, is there a strong reason to use Counter here instead of Timer ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dnephin The default retry creates errors very easily, on my machine, on master:

while /usr/local/bin/go test -timeout 30s github.com/hashicorp/consul/agent/consul -run '^(TestReplication_FederationStates)$'; do go clean -testcache; done
ok  	github.com/hashicorp/consul/agent/consul	4.903s
ok  	github.com/hashicorp/consul/agent/consul	5.115s
ok  	github.com/hashicorp/consul/agent/consul	5.154s
ok  	github.com/hashicorp/consul/agent/consul	5.058s
ok  	github.com/hashicorp/consul/agent/consul	5.149s
[INFO] freeport: blockSize 1500 too big for system limit 256. Adjusting...
[INFO] freeport: detected ephemeral port range of [49152, 65535]
--- FAIL: TestReplication_FederationStates (11.13s)
    writer.go:29: 2020-04-08T00:34:52.531+0200 [INFO]  TestReplication_FederationStates-node-1.server.raft: initial configuration: index=1 servers="[{Suffrage:Voter ID:55a53765-656e-f83e-68e4-adb4dbfa637c Address:127.0.0.1:12531}]"

=> fails at 6th run with master 74bd138

While I tested it on my machine, it was failing less - but I was mistaken, I retried today, and it is failing as much - even with 60s timeout.

There is probably a real bug, sometimes, states are not properly replicated, this patch does not work, do you want me to create an issue (because the root cause is probably far more complex) ?

@pierresouchay
Copy link
Contributor Author

This patch does not solve anything

@pierresouchay
Copy link
Contributor Author

@dnephin I think I fixed the real issue here: #7612

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants