Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: clean up replica unquiescence #105041

Merged
merged 8 commits into from
Jun 20, 2023

Conversation

erikgrinaker
Copy link
Contributor

@erikgrinaker erikgrinaker commented Jun 16, 2023

kvserver: remove stale maybeQuiesce TODO

The leader won't quiesce if followers aren't caught up.

kvserver: remove maybeUnquiesceAndWakeLeaderLocked

This patch merges maybeUnquiesceAndWakeLeaderLocked() into maybeUnquiesceWithOptionsLocked(), using a separate parameter to wake the leader. Care is taken to make this purely mechanical, with no logical changes at all.

kvserver: remove maybeUnquiesceWithOptionsLocked

This patch merges maybeUnquiesceWithOptionsLocked() into maybeUnquiesceLocked(), requiring callers to always specify options. The subtlety around unquiescence, and few call sites, makes it beneficial to be explicit.

This is a purely mechanical change, with no logical changes. A couple of tests have been changed to now wake the leader when unquiescing, but this has no bearing on the tests.

kvserver: set last update times on leader unquiesce

Previously, Replica.lastUpdateTimes was updated whenever a replica unquiesced without attempting to wake the leader. However, this had two flaws: a leader could fail to call it if it hit a code path where it did attempt to wake the leader (even if it was leader itself), e.g. by returning true from withRaftGroup(), and it could also be called on a follower where it would have no effect.

This patch instead updates it when unquiescing the leader, regardless of unquiesce options.

kvserver: only wake leader when unquiescing a follower

Previously, any replica would wake the leader when unquiescing, if requested by the caller. However, this could cause the leader to propose an empty command to wake itself, which commonly happens in handleRaftReady() via withRaftGroupLocked(). This appears unnecessary, and likely causes a large amount of Raft proposals with ranges that frequently (un)quiesce.

This patch instead only attempts to wake the leader from followers.

kvserver: wake leader before campaigning when unquiescing

We should wake the leader before campaigning when unquiescing, since we won't send the proposal in the candidate state and thus won't give the leader a chance to assert leadership if we're wrong about it being dead.

kvserver: omit unnecessary campaign checks on leader unquiescence

This patch changes a couple of maybeUnquiesce() call sites to not attempt to campaign when we know the replica must already be the leader.

Epic: none
Release note: None

@erikgrinaker erikgrinaker self-assigned this Jun 16, 2023
@erikgrinaker erikgrinaker changed the title Raft unquiesce cleanup kvserver: clean up replica unquiescence Jun 16, 2023
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@erikgrinaker
Copy link
Contributor Author

erikgrinaker commented Jun 16, 2023

@tbg @pavelkalinnikov Would like some scrutiny of the "only wake leader when unquiescing a follower" commit here. It seems problematic and unnecessary that we submit a proposal when the leader wakes itself, but it's possible that we've come to rely implicitly on this logic.

I expect some test fallout from this as well. Appears to pass. Maybe this is fine.

@erikgrinaker erikgrinaker requested review from tbg and pav-kv June 16, 2023 12:40
@erikgrinaker erikgrinaker force-pushed the raft-unquiesce-cleanup branch from 3df693c to df4b6bc Compare June 16, 2023 13:17
@erikgrinaker erikgrinaker marked this pull request as ready for review June 16, 2023 13:18
@erikgrinaker erikgrinaker requested a review from a team June 16, 2023 13:18
@erikgrinaker erikgrinaker requested a review from a team as a code owner June 16, 2023 13:18
@erikgrinaker erikgrinaker force-pushed the raft-unquiesce-cleanup branch 2 times, most recently from 0ea89a1 to 6be9e48 Compare June 16, 2023 14:24
@erikgrinaker
Copy link
Contributor Author

This resulted in a substantial improvement when running kv0 on 20k ranges with COCKROACH_QUIESCE_AFTER_TICKS=0 where we aggressively quiesce ranges (the default in 23.1). The Raft proposal rate was cut almost by half, and the Raft message rate dropped by 30%:

Screenshot 2023-06-16 at 16 50 59

Throughtput improved by 6% (sample size of 1 though):

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
  180.0s        0        1840221        10223.4     18.8     15.2     50.3     71.3    520.1  write

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
  180.0s        0        1953950        10855.3     17.7     13.6     52.4     71.3    469.8  write

I'm not sure that this is safe yet, since I saw a fair bit of election timeouts after cold starting the cluster. Having a closer look.

@erikgrinaker
Copy link
Contributor Author

erikgrinaker commented Jun 16, 2023

I'm not sure that this is safe yet, since I saw a fair bit of election timeouts after cold starting the cluster.

Saw the same problem with master, so it isn't this PR, but we may have broken something recently. Digging in.

@erikgrinaker
Copy link
Contributor Author

we may have broken something recently.

#105052

Copy link
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks.

The leader won't quiesce if followers aren't caught up.

Epic: none
Release note: None
This patch merges `maybeUnquiesceAndWakeLeaderLocked()` into
`maybeUnquiesceWithOptionsLocked()`, using a separate parameter to wake
the leader. Care is taken to make this purely mechanical, with no
logical changes at all.

Epic: none
Release note: None
This patch merges `maybeUnquiesceWithOptionsLocked()` into
`maybeUnquiesceLocked()`, requiring callers to always specify options.
The subtlety around unquiescence, and few call sites, makes it
beneficial to be explicit.

This is a purely mechanical change, with no logical changes. A couple of
tests have been changed to now wake the leader when unquiescing, but
this has no bearing on the tests.

Epic: none
Release note: None
Previously, `Replica.lastUpdateTimes` was updated whenever a replica
unquiesced without attempting to wake the leader. However, this had two
flaws: a leader could fail to call it if it hit a code path where it did
attempt to wake the leader (even if it was leader itself), e.g. by
returning `true` from `withRaftGroup()`, and it could also be called on
a follower where it would have no effect.

This patch instead updates it when unquiescing the leader, regardless of
unquiesce options.

Epic: none
Release note: None
Previously, any replica would wake the leader when unquiescing, if
requested by the caller. However, this could cause the leader to propose
an empty command to wake itself, which commonly happens in
`handleRaftReady()` via `withRaftGroupLocked()`. This appears
unnecessary, and likely causes a large amount of Raft proposals with
ranges that frequently (un)quiesce.

This patch instead only attempts to wake the leader from followers.

Epic: none
Release note: None
We should wake the leader before campaigning when unquiescing, since we
won't send the proposal in the candidate state and thus won't give the
leader a chance to assert leadership if we're wrong about it being dead.

Epic: none
Release note: None
This patch changes a couple of `maybeUnquiesce()` call sites to not
attempt to campaign when we know the replica must already be the leader.

Epic: none
Release note: None
@erikgrinaker erikgrinaker force-pushed the raft-unquiesce-cleanup branch from 24459ca to 50d7264 Compare June 20, 2023 09:48
if wakeLeader {
// Propose an empty command which will wake the leader.
if log.V(3) {
log.Infof(ctx, "waking %d leader", r.RangeID)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about "r%d" instead of "%d". It's easier to search by "rNNN" in logs when investigating, this avoids false positives.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was following the convention of the existing log messages here, didn't particularly feel like updating a bunch of log messages. But I can do a pass if there aren't that many to update.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tacked on a commit.

@@ -495,15 +495,7 @@ func (r *Replica) GetQueueLastProcessed(ctx context.Context, queue string) (hlc.
}

func (r *Replica) MaybeUnquiesce() bool {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The semantics of MaybeUnquiesce changed. Were there any callers who wanted the old behaviour (wakeLeader=mayCampaign=false)?

Copy link
Contributor Author

@erikgrinaker erikgrinaker Jun 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests don't care -- explained in the commit message:

A couple of tests have been changed to now wake the leader when unquiescing, but this has no bearing on the tests.

In case it wasn't clear, this is a test-only file, helpers_test.go.


if wakeLeader {
} else if st.RaftState == raft.StateFollower && wakeLeader {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not in StateCandidate? Waking the leader doesn't make sense because the candidate "doesn't know" the leader?

Copy link
Contributor Author

@erikgrinaker erikgrinaker Jun 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, a candidate has no leader, and won't send proposals.

@@ -705,7 +705,7 @@ func (s *Store) nodeIsLiveCallback(l livenesspb.Liveness) {
lagging := r.mu.laggingFollowersOnQuiesce
r.mu.RUnlock()
if quiescent && lagging.MemberStale(l) {
r.maybeUnquiesce(false /* wakeLeader */, true /* mayCampaign */)
r.maybeUnquiesce(false /* wakeLeader */, false /* mayCampaign */) // already leader
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we know we're already leader here?

Copy link
Contributor Author

@erikgrinaker erikgrinaker Jun 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because laggingFollowersOnQuiesce is only non-nil on the leader. Conceptually, only the leader keeps track of lagging followers, because only leaders have followers.

@erikgrinaker
Copy link
Contributor Author

TFTRs!

bors r+

@craig
Copy link
Contributor

craig bot commented Jun 20, 2023

Build succeeded:

@craig craig bot merged commit a416398 into cockroachdb:master Jun 20, 2023
@erikgrinaker erikgrinaker deleted the raft-unquiesce-cleanup branch November 14, 2023 10:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants