-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discussion: reworking membership changes #117
Discussion: reworking membership changes #117
Conversation
The AppendEntries request handler used to temporarily truncate entries from the store that had indexes overlapping with the request, then it'd go back and add all the entries in the request. This opened a window of vulnerability during which if the server crashed, it could have deleted some log entries that it shouldn't have. For this to happen, the follower would have to receive an AppendEntries request with entries that it already had. In principle, that could be due to a duplicate request (depends on transport behavior) or to a leader with a nextIndex set too low (probably isn't supposed to happen with the back-up-one-entry algorithm). The new code truncates only conflicting entries from the log store, then appends only the new entries. This also fixes a related issue where the last log index and term are set to the last entry in the request, even if the last entry in the request is not the last entry in the log (suppose the request contained an earlier subsequence of the existing log).
This is meant for discussion and is not meant to be merged.
mentioning #84 ("Cleanup Meta Ticket") |
|
||
## Data-centric view | ||
|
||
We propose to re-define a *configuration* as a set of servers, where each server includes an address (as it does today) and a mode that is either: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a good opportunity to give each server something like a GUID that's independent of its address, so that we can eventually support address changes without changing a server's identity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Want to spec out how that would work? Specifically:
- Who would generate the GUID and when?
- Where would it be stored?
- How would it be reflected in the old AddPeer/RemovePeer or upcoming AddVoter/AddNonvoter/DemoteVoter/RemovePeer API?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll spec this out - I think I've got a plan that'll work for this.
@ongardie added some initial thoughts as inline comments. I've got some cycles this week and can work on this with you guys as well if we can figure out some ways to break things up. |
@slackpad thanks for the comments. I replied to a few above.
As I mentioned, @superfell is out this week. I was going to wait for him to return (he hasn't read this doc, and he might be sitting on some useful changes). We may have to do pair programming or something extreme like that, because I really don't see how to split up the work (pun intended). If you've got space cycles this week, I think we'd be better off applying them to more parallelizable tasks. For example, how about attacking some more of #84 (AFAIK, multi-row fetching or follower replication are open)? |
Yep - I'll take some of those smaller tasks this week while we work through the details here. |
We propose the following operations for clients to manipulate the cluster configuration: | ||
- AddVoter: server becomes staging unless voter, | ||
- AddNonvoter: server becomes nonvoter unless staging or voter, | ||
- DemoteVoter: server becomes nonvoter unless absent, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To have more symmetry with AddVoter
how about just calling AddNonvoter
in order to demote? Either that or we should have a PromoteNonvoter
API vs. adding it again.
Is there a time line for these enhancements? |
@fd work is underway now, so hopefully very soon! |
…membership Merged in: - issue-84-integration to catch up with the latest changes, - ongardie/truncation (PR hashicorp#113) to avoid a conflict, and - 2c24fd1 (from Simon Fell, splits out configuration change channel) because that's useful here.
This is the main patch for hashicorp#117. It still has several outstanding problems that we'll work through in later commits, but it's good enough to discuss.
Added a really, really big diff (687db02) with the bulk of the changes. @superfell gets credit for some of it (especially fixing tests), but I squashed his stuff into the one commit. @slackpad, @sean-: please review. We should aim to come up with a list of outstanding problems. Some of them are going to be easy, others less so. |
@ongardie thanks for this! I'm returning from some travel on Monday so I should be able to make a deep dive here on Tuesday. |
return nil, fmt.Errorf("failed to get list of peers: %v", err) | ||
} | ||
peers = ExcludePeer(peers, localAddr) | ||
localGUID := localAddr // TODO: where should we get this from? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My initial thought was that this would be provided as a new field in conf
, and if that's empty then we'd use localAddr
as a default and emit a deprecation warning that we'll stop defaulting it at some point in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, that's what I was leaning towards.
@@ -488,7 +607,7 @@ func (r *Raft) Stats() map[string]string { | |||
"fsm_pending": toString(uint64(len(r.fsmCommitCh))), | |||
"last_snapshot_index": toString(lastSnapIndex), | |||
"last_snapshot_term": toString(lastSnapTerm), | |||
"num_peers": toString(uint64(len(r.peers))), | |||
"num_servers": toString(uint64(len(r.configurations.latest.Servers))), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Elsewhere we've taken to including the list of servers along with the number of peers. e.g. https://github.com/hashicorp/nomad/blob/c6b7bb66f309b7fc3ef325b158115aa046dcbf30/nomad/server.go#L905-L906
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A list of servers doesn't have all the information in this new world. What is this function used for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This ends up getting displayed to operators when they run consul info
on a server, so the results here can be pretty free-form. This is used for debugging and bug reporting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A %+v
on the configuration might be the best thing here.
This uncovered an issue with: r.leaderState.commitment.match(r.localID, lastIndex) where localAddress was being used incorrectly instead.
Went ahead and merged this so we can submit smaller-scoped changes to close this out. Make a list of outstanding items on #84. |
@slackpad, I wasn't expecting you to merge that! I think there's still some important stuff left to do. For example, we should audit whether this is going to work for rolling upgrades. |
I think not. For example, new code will create LogConfiguration entries. What will old code do with them? |
Oh, I see the list on #84 now, fine. |
@ongardie I just merged it to the integration branch, not master! |
This is meant for discussion and is not meant to be merged. See the text in
membership.md
.