-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Potential fixes for flapping leadership #6951
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -57,11 +57,26 @@ var defaultSchedulerConfig = &structs.SchedulerConfiguration{ | |
// as the leader in the Raft cluster. There is some work the leader is | ||
// expected to do, so we must react to changes | ||
func (s *Server) monitorLeadership() { | ||
// We use the notify channel we configured Raft with, NOT Raft's | ||
// leaderCh, which is only notified best-effort. Doing this ensures | ||
// that we get all notifications in order, which is required for | ||
// cleanup and to ensure we never run multiple leader loops. | ||
leaderCh := s.leaderCh | ||
|
||
var weAreLeaderCh chan struct{} | ||
var leaderLoop sync.WaitGroup | ||
for { | ||
select { | ||
case isLeader := <-s.leaderCh: | ||
case isLeader := <-leaderCh: | ||
// suppress leadership flapping | ||
isLeader, suppressed := suppressLeadershipFlaps(isLeader, leaderCh) | ||
|
||
// if gained and lost leadership immediately, move on without emitting error | ||
if suppressed && !isLeader && weAreLeaderCh == nil { | ||
s.logger.Info("cluster leadership acquired but lost immediately") | ||
continue | ||
} | ||
|
||
switch { | ||
case isLeader: | ||
if weAreLeaderCh != nil { | ||
|
@@ -96,6 +111,35 @@ func (s *Server) monitorLeadership() { | |
} | ||
} | ||
|
||
// suppressLeadershipFlaps suppresses cases where we gained but lost leadership immediately. | ||
// Protect against case where leadership transitions multiple times while server updates | ||
// internal leadership related structures. | ||
// | ||
// This uses a conservative approach - mainly avoid establishing leadership if server already lost it | ||
// | ||
// Params: | ||
// isLeader: the last value dequeued from channel | ||
// ch: leadership channel | ||
// Returns: | ||
// leader: last buffered leadership state | ||
// suppressed: true if method dequeued elements from channel | ||
func suppressLeadershipFlaps(isLeader bool, ch <-chan bool) (leader, suppressed bool) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Effectively what we're doing here is that if we get a notification that we're now the leader, we drain the notification channel to make sure that there hasn't been more than one notification since we handled the last one (in the outer loop of While this solves for a very tight loop of leadership flapping, I think we can still get into a flapping state if the leadership transitions are happening at a rate which is slower than the outer loop of (Moving the |
||
if !isLeader { | ||
return isLeader, false | ||
} | ||
|
||
leader = isLeader | ||
for { | ||
select { | ||
case v := <-ch: | ||
leader = v | ||
suppressed = true | ||
default: | ||
return leader, suppressed | ||
} | ||
} | ||
} | ||
|
||
// leaderLoop runs as long as we are the leader to run various | ||
// maintenance activities | ||
func (s *Server) leaderLoop(stopCh chan struct{}) { | ||
|
@@ -148,16 +192,18 @@ RECONCILE: | |
// updates | ||
reconcileCh = s.reconcileCh | ||
|
||
WAIT: | ||
// Poll the stop channel to give it priority so we don't waste time | ||
// trying to perform the other operations if we have been asked to shut | ||
// down. | ||
select { | ||
case <-stopCh: | ||
return | ||
case <-s.shutdownCh: | ||
return | ||
default: | ||
} | ||
|
||
WAIT: | ||
// Wait until leadership is lost | ||
for { | ||
select { | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1234,8 +1234,8 @@ func (s *Server) setupRaft() error { | |
} | ||
} | ||
|
||
// Setup the leader channel | ||
leaderCh := make(chan bool, 1) | ||
// Set up a channel for reliable leader notifications. | ||
leaderCh := make(chan bool, 10) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. :-) Hm, in many cases, this will be enough, and why you extend this value? You already add |
||
s.config.RaftConfig.NotifyCh = leaderCh | ||
s.leaderCh = leaderCh | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does flapping indicate something an operator should be aware of such as excessive cpu load or network issues? If so we might want to document that in at least a code comment as I'm worried this log message may flummox the first users who hit it. If it does hint at some degenerate state, users who see this might already be stressed and confused, so if there's any more hints or context we can offer we should.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To add to this, I've often found as an operator that
info
log level is indecisive. If it's something I need to pay attention to and take action on, it should probably be atwarn
. Otherwise it's telling me "hey this is something you want to know about" without telling me what to do with that information.(My rule of thumb has often been that
info
is for information at startup like "I'm listening on port 1234 and running with plugins X and Y", and then not used after that, but I don't think we have firm guidelines anywhere on Nomad.)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, are this eliminate
leaderCh
overflow? I think that is no. For example this is not remove situation when leadership lost, and you just increaseleaderCh
buffer size which is not enough imho in all cases