-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug Fix: Avoid Dgraph cluster getting stuck in infinite leader election #3391
Conversation
…p. Capture the latency of individual components in Raft.Ready better.
conn/node.go, line 253 at r1 (raw file):
This happens in glog.V(2) but the ticker in ReportRaftComms only zeroes out heartbeats in glog.V(3). Is that OK? |
conn/raft_server.go, line 232 at r1 (raw file):
Same question here too about incrementing in V(2) and glogging/zero-ing in V(3). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 5 files reviewed, 2 unresolved discussions (waiting on @danielmai and @manishrjain)
conn/node.go, line 253 at r1 (raw file):
Previously, danielmai (Daniel Mai) wrote…
This happens in glog.V(2) but the ticker in ReportRaftComms only zeroes out heartbeats in glog.V(3). Is that OK?
Yeah, that's alright. Even if uint64 overflows (??) and goes back to zero, it's not a big deal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 5 of 5 files at r1.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @danielmai and @manishrjain)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @manishrjain)
conn/node.go, line 253 at r1 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
Yeah, that's alright. Even if uint64 overflows (??) and goes back to zero, it's not a big deal.
Sounds good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @manishrjain)
…on (#3391) Dgraph Alphas were calculating snapshots and checkpoints in the main Raft loop, which depending upon disk speed caused Ticks to not be done for seconds. This caused the followers to assume that the leader is unavailable, triggering an election. Because the checkpoint and snapshot calculation happens every 30s, the election was happening every 30s as well. This PR moves both of those outside the main loop into their own goroutine (colocated with the code which shuts down Raft node). Tested successfully with a live cluster which was exhibiting these symptoms. This PR also tracks how many heartbeats have come in and gone out from each node and prints them out under V(3). Useful for debugging. The PR improves upon and uses x.Timer to track Raft.Ready components' latencies and report them in both Alphas and Zeros. This fixes the incorrect statement we were making about disk latency being the primary cause of Raft.Ready being slow. Changes: * Report Heartbeat comms * Add logs around heartbeats. * Move snapshot and checkpoint calculation outside of the main Raft loop. Capture the latency of individual components in Raft.Ready better. * Add timer to Zero as well. Fix a bug: Use a for loop when going over slow ticker. * Move num pending txns to V(2). * Move the checkpointing code outside of the Run func.
@@ -72,6 +72,9 @@ type Node struct { | |||
// The stages are proposed -> committed (accepted by cluster) -> | |||
// applied (to PL) -> synced (to BadgerDB). | |||
Applied y.WaterMark | |||
|
|||
heartbeatsOut int64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
atomic package issues here? This is 64 bit. https://golang.org/pkg/sync/atomic/#pkg-note-BUG
@@ -155,6 +158,20 @@ func NewNode(rc *pb.RaftContext, store *raftwal.DiskStorage) *Node { | |||
return n | |||
} | |||
|
|||
func (n *Node) ReportRaftComms() { | |||
if !glog.V(3) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this work for level > 3 too?
@@ -964,14 +983,13 @@ func (n *node) Run() { | |||
span.End() | |||
ostats.RecordWithTags(context.Background(), | |||
[]tag.Mutator{tag.Upsert(x.KeyMethod, "alpha.RunLoop")}, | |||
x.LatencyMs.M(x.SinceMs(start))) | |||
x.LatencyMs.M(float64(timer.Total())/1e6)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could divide by time.MilliSecond instead here
func (t *Timer) All() []time.Duration { | ||
return t.records | ||
func (t *Timer) String() string { | ||
sort.Slice(t.records, func(i, j int) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is sorting helpful in viewing the results?
…on (dgraph-io#3391) Dgraph Alphas were calculating snapshots and checkpoints in the main Raft loop, which depending upon disk speed caused Ticks to not be done for seconds. This caused the followers to assume that the leader is unavailable, triggering an election. Because the checkpoint and snapshot calculation happens every 30s, the election was happening every 30s as well. This PR moves both of those outside the main loop into their own goroutine (colocated with the code which shuts down Raft node). Tested successfully with a live cluster which was exhibiting these symptoms. This PR also tracks how many heartbeats have come in and gone out from each node and prints them out under V(3). Useful for debugging. The PR improves upon and uses x.Timer to track Raft.Ready components' latencies and report them in both Alphas and Zeros. This fixes the incorrect statement we were making about disk latency being the primary cause of Raft.Ready being slow. Changes: * Report Heartbeat comms * Add logs around heartbeats. * Move snapshot and checkpoint calculation outside of the main Raft loop. Capture the latency of individual components in Raft.Ready better. * Add timer to Zero as well. Fix a bug: Use a for loop when going over slow ticker. * Move num pending txns to V(2). * Move the checkpointing code outside of the Run func.
Dgraph Alphas were calculating snapshots and checkpoints in the main Raft loop, which depending upon disk speed caused Ticks to not be done for seconds. This caused the followers to assume that the leader is unavailable, triggering an election. Because the checkpoint and snapshot calculation happens every 30s, the election was happening every 30s as well.
This PR moves both of those outside the main loop into their own goroutine (colocated with the code which shuts down Raft node). Tested successfully with a live cluster which was exhibiting these symptoms.
This PR also tracks how many heartbeats have come in and gone out from each node and prints them out under V(3). Useful for debugging.
The PR improves upon and uses x.Timer to track Raft.Ready components' latencies and report them in both Alphas and Zeros. This fixes the incorrect statement we were making about disk latency being the primary cause of Raft.Ready being slow.
This change is