Skip to content

Commit

Permalink
Merge #104212
Browse files Browse the repository at this point in the history
104212: kvserver: don't report unreachable followers when quiesced r=erikgrinaker a=erikgrinaker

Previously, `Replica.tick()` could mark a Raft follower as unreachable even if the replica was quiesced, transitioning it to `StateProbe`. This in turn can prevent lease transfers, which require the target to be up-to-date. However, with the range quiesced, the follower wouldn't transition back to healthy until the leader unquiesced for some reason.

We normally don't quiesce with a follower that isn't caught up, and applying internal state transitions while the range is quiesced is problematic since it won't have a chance to react to those state transitions.

This patch instead defers marking the follower as unreachable until the next tick, whenever the range unquiesces, such that the leader can detect the follower's recovery. Since ranges aren't allowed to quiesce when they have outstanding ready events or log entries, we're likely to detect unavailability and mark the follower as unavailable on the next tick before quiescing anyway.

Resolves #103828.

Epic: none
Release note: None

Co-authored-by: Erik Grinaker <[email protected]>
  • Loading branch information
craig[bot] and erikgrinaker committed Jun 6, 2023
2 parents 8993ad2 + 013f7d7 commit 4b8bd6d
Showing 1 changed file with 8 additions and 9 deletions.
17 changes: 8 additions & 9 deletions pkg/kv/kvserver/replica_raft.go
Original file line number Diff line number Diff line change
Expand Up @@ -1200,11 +1200,6 @@ func maybeFatalOnRaftReadyErr(ctx context.Context, err error) (removed bool) {
func (r *Replica) tick(
ctx context.Context, livenessMap livenesspb.IsLiveMap, ioThresholdMap *ioThresholdMap,
) (bool, error) {
r.unreachablesMu.Lock()
remotes := r.unreachablesMu.remotes
r.unreachablesMu.remotes = nil
r.unreachablesMu.Unlock()

r.raftMu.Lock()
defer r.raftMu.Unlock()
r.mu.Lock()
Expand All @@ -1215,14 +1210,18 @@ func (r *Replica) tick(
return false, nil
}

for remoteReplica := range remotes {
r.mu.internalRaftGroup.ReportUnreachable(uint64(remoteReplica))
}

if r.mu.quiescent {
return false, nil
}

r.unreachablesMu.Lock()
remotes := r.unreachablesMu.remotes
r.unreachablesMu.remotes = nil
r.unreachablesMu.Unlock()
for remoteReplica := range remotes {
r.mu.internalRaftGroup.ReportUnreachable(uint64(remoteReplica))
}

r.updatePausedFollowersLocked(ctx, ioThresholdMap)

leaseStatus := r.leaseStatusAtRLocked(ctx, r.store.Clock().NowAsClockTimestamp())
Expand Down

0 comments on commit 4b8bd6d

Please sign in to comment.