-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Non voter with higher term should not cause cluster instability #525
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1363,6 +1363,7 @@ func (r *Raft) processHeartbeat(rpc RPC) { | |
func (r *Raft) appendEntries(rpc RPC, a *AppendEntriesRequest) { | ||
defer metrics.MeasureSince([]string{"raft", "rpc", "appendEntries"}, time.Now()) | ||
// Setup a response | ||
noRetryBackoffIfErr := true | ||
resp := &AppendEntriesResponse{ | ||
RPCHeader: r.getRPCHeader(), | ||
Term: r.getCurrentTerm(), | ||
|
@@ -1375,7 +1376,23 @@ func (r *Raft) appendEntries(rpc RPC, a *AppendEntriesRequest) { | |
rpc.Respond(resp, rpcErr) | ||
}() | ||
|
||
// Ignore an older term | ||
// this use case would happen when a node was a voter and is added back to the cluster as non voter, | ||
// it term could be higher then the cluster, but because it's a non voter that term need to be discarded | ||
// in favor of the cluster current term to keep the cluster stable, as an election don't need to be started | ||
// by a node which don't have voting rights. | ||
currentTerm := r.getCurrentTerm() | ||
hasVote := hasVote(r.getLatestConfiguration(), r.localID) | ||
if a.Term < currentTerm && !hasVote && a.PrevLogTerm >= r.lastLogTerm { | ||
r.logger.Warn("older term sent to non-voter", "PrevLogTerm", a.PrevLogTerm, | ||
"lastLogTerm", r.lastLogTerm, | ||
" Term", a.Term, "currentTerm", currentTerm) | ||
r.setState(Follower) | ||
r.setCurrentTerm(a.Term) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there any other state we need to set here 🤔 I don't think so because this is the same as the code below for normal happy-path replication where we switch to a new leader's term after an election when we get the first But i know we do persist the identity of the candidate we last voted for, so I wonder if the happy path doesn't update anything because it assumes that we set the identity during election? Even then that can't be right in the case of a follower coming up and learning about a new leader that it didn't vote for so again I guess you did the right thing here but I'm slightly intrigued about how and where we update the state about who our current leader 🤔. Do you know @dhiaayachi ? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. it's done right after in here |
||
resp.Term = a.Term | ||
noRetryBackoffIfErr = false | ||
} | ||
|
||
// if a node is a voter, Ignore an older term | ||
if a.Term < r.getCurrentTerm() { | ||
return | ||
} | ||
|
@@ -1410,7 +1427,7 @@ func (r *Raft) appendEntries(rpc RPC, a *AppendEntriesRequest) { | |
"previous-index", a.PrevLogEntry, | ||
"last-index", lastIdx, | ||
"error", err) | ||
resp.NoRetryBackoff = true | ||
resp.NoRetryBackoff = noRetryBackoffIfErr | ||
return | ||
} | ||
prevLogTerm = prevLog.Term | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -233,7 +233,7 @@ START: | |
appendStats(string(peer.ID), start, float32(len(req.Entries))) | ||
|
||
// Check for a newer term, stop running | ||
if resp.Term > req.Term { | ||
if resp.Term > req.Term && s.peer.Suffrage != Nonvoter { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This seems like the real fix to prevent disruption to the leader and I think is safe and correct. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes actually that was the first change I made but this won't help getting the non voter effectively join the cluster as it will continue rejecting the replication forever. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why this instead of Suffrage == Voter? |
||
r.handleStaleTerm(s) | ||
return true | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a comment here is useful but perhaps a link to this PR or the issue would be good because it's super hard to express all the nuance of the issue and why this is correct in a few lines?