Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tests: deflake test that joins a server with non-voting servers to form quorum #12130

Merged
merged 1 commit into from
Feb 25, 2022

Conversation

shoenig
Copy link
Member

@shoenig shoenig commented Feb 24, 2022

This PR

  • upgrades the serf library
  • has the test start the join process using the un-joined server first
  • disables schedulers on the servers
  • uses the WaitForLeader and wantPeers helpers

Not sure which, if any of these actually improves the flakiness of this test.

Closes #12115 ... hopefully 🐈

…rm qourum

This PR
 - upgrades the serf library
 - has the test start the join process using the un-joined server first
 - disables schedulers on the servers
 - uses the WaitForLeader and wantPeers helpers

Not sure which, if any of these actually improves the flakiness of this test.
@shoenig shoenig force-pushed the flakey-serf-non-voter branch from 213fc39 to bd03d25 Compare February 24, 2022 23:03
@vercel vercel bot temporarily deployed to Preview – nomad February 24, 2022 23:03 Inactive
@shoenig shoenig marked this pull request as ready for review February 24, 2022 23:34
@shoenig shoenig requested review from schmichael and tgross February 24, 2022 23:34
@shoenig shoenig changed the title tests: deflake test that joins a server with non-voting servers to form qourum tests: deflake test that joins a server with non-voting servers to form quorum Feb 25, 2022
Copy link
Member

@tgross tgross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

The flake has been happening so frequently that do we maybe want to just re-run CI a few times on this PR to see if it happens again? I was running into it maybe 50/50 so we could build confidence in the fix quickly. (Or because it seems like a good change overall, we could also just go ahead and merge and see if it helps.)

@shoenig
Copy link
Member Author

shoenig commented Feb 25, 2022

@shoenig shoenig merged commit 5683fdf into main Feb 25, 2022
@shoenig shoenig deleted the flakey-serf-non-voter branch February 25, 2022 15:12
lgfa29 added a commit that referenced this pull request Aug 30, 2022
PR #12130 refactored the test to use the `wantPeers` helper, but this
function only returns the number of voting peers, which in this test
should be equal to 2.

I think the tests were passing back them because of a bug in Raft
(hashicorp/raft#483) where a non-voting server
was able to transition to candidate state.

One possible evidence of this is that a successful test run would have
the following log line:

```
[email protected]/raft.go:1058: nomad.raft: updating configuration: command=AddVoter server-id=127.0.0.1:9101 server-addr=127.0.0.1:9101 servers="[{Suffrage:Voter ID:127.0.0.1:9107 Address:127.0.0.1:9107} {Suffrage:Voter ID:127.0.0.1:9105 Address:127.0.0.1:9105} {Suffrage:Voter ID:127.0.0.1:9103 Address:127.0.0.1:9103} {Suffrage:Voter ID:127.0.0.1:9101 Address:127.0.0.1:9101}]"
```

This commit reverts the test logic to check for peer count, regardless
of voting status.
lgfa29 added a commit that referenced this pull request Aug 30, 2022
PR #12130 refactored the test to use the `wantPeers` helper, but this
function only returns the number of voting peers, which in this test
should be equal to 2.

I think the tests were passing back them because of a bug in Raft
(hashicorp/raft#483) where a non-voting server
was able to transition to candidate state.

One possible evidence of this is that a successful test run would have
the following log line:

```
[email protected]/raft.go:1058: nomad.raft: updating configuration: command=AddVoter server-id=127.0.0.1:9101 server-addr=127.0.0.1:9101 servers="[{Suffrage:Voter ID:127.0.0.1:9107 Address:127.0.0.1:9107} {Suffrage:Voter ID:127.0.0.1:9105 Address:127.0.0.1:9105} {Suffrage:Voter ID:127.0.0.1:9103 Address:127.0.0.1:9103} {Suffrage:Voter ID:127.0.0.1:9101 Address:127.0.0.1:9101}]"
```

This commit reverts the test logic to check for peer count, regardless
of voting status.
@github-actions
Copy link

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 28, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make TestNomad_BootstrapExpect_NonVoter stop failing
2 participants