Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: gracefully handle races during learner removals #79405

Conversation

aayushshah15
Copy link
Contributor

@aayushshah15 aayushshah15 commented Apr 5, 2022

Previously, at the end of a replication change, if the ChangeReplicas request
found that the (demoted) learner replica it was trying to remove from the range
had already been removed (presumably because it raced with the mergeQueue, the
StoreRebalancer, or something else), it would error out. This was
unfortunate, because, for all practical purposes, the replication change had
succeeded
.

We can now gracefully handle this instead by no-oping if we detect that the
replica we were trying to remove has already been removed.

Relates to #79118

Release note: None

Jira issue: CRDB-14809

@aayushshah15 aayushshah15 requested a review from a team as a code owner April 5, 2022 03:45
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@aayushshah15 aayushshah15 force-pushed the 20220404_doNotErrorOnDemotedLearnerRemovalRace branch 2 times, most recently from 4d1607f to 2559fcc Compare April 5, 2022 04:24
@aayushshah15
Copy link
Contributor Author

cc @tbg, feel free to ignore this, but I wanted to add you in case you see something obviously wrong with doing what we're doing in this patch.

@tbg
Copy link
Member

tbg commented Apr 5, 2022

Looks good! Not formally signing off because I have late night brain mush but I did leave some suggestions and can give a real review when the next turnaround comes.

@aayushshah15 aayushshah15 force-pushed the 20220404_doNotErrorOnDemotedLearnerRemovalRace branch 3 times, most recently from 394efdf to 1efaf60 Compare April 6, 2022 05:36
Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

I did leave some suggestions

Hm, I'm not seeing these suggestions.

Reviewed 1 of 2 files at r1, 5 of 5 files at r2, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @aayushshah15 and @tbg)


pkg/kv/kvserver/replica_command.go, line 2068 at r2 (raw file):

					return nil, errors.AssertionFailedf("cannot transition from %s to VOTER_OUTGOING", prevTyp)
				} else {
					rDesc, _, _ = updatedDesc.SetReplicaType(chg.target.NodeID, chg.target.StoreID, roachpb.VOTER_OUTGOING)

Is this case still valid? Or does the check in execReplicationChangesForVoters that switches from a internalChangeTypeRemoveLearner to a internalChangeTypeDemoteVoterToLearner make it dead code?


pkg/kv/kvserver/replica_command.go, line 2210 at r2 (raw file):

	learnerAlreadyRemoved := true
	check := func(kvDesc *roachpb.RangeDescriptor) bool {

nit: could we name this return arg?


pkg/kv/kvserver/replica_command.go, line 2230 at r2 (raw file):

				for _, repl := range kvDesc.Replicas().Descriptors() {
					if repl.StoreID == chgs[0].target.StoreID {
						learnerAlreadyRemoved = false

nit: break

Copy link
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry hadn't hit the submit button

@@ -2202,6 +2232,13 @@ func execChangeReplicasTxn(
returnDesc = desc
return nil
}
if chgs.isSingleLearnerRemoval() && learnerAlreadyRemoved {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol on line 2218 above.

@@ -2169,19 +2182,36 @@ func execChangeReplicasTxn(

descKey := keys.RangeDescriptorKey(referenceDesc.StartKey)

learnerAlreadyRemoved := true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems a bit iffy tbh. Relying on a closure that is invoked a few call stacks down and its ordering here is fairly intransparent. I spent around ten minutes trying to think of an elegant way to refactor these checks, but came up short. But I do feel sort of strongly that we should avoid the opaque mutation to learnerAlreadyRemoved. Perhaps you can pull out a closure that we invoke in both places? Something like

isIdempotentLearnerRemoval := func(actual *roachpb.RangeDescriptor, storeID roachpb.StoreID) bool {
  // ...
}

You can invoke that in check, and again when short-circuiting the txn.

But maybe we can do one better, and return a skip bool from check and pass it through from conditionalGetDescValueFromDB?

desc, dbDescValue, skip, err := conditionalGetDescValueFromDB(
ctx, txn, referenceDesc.StartKey, false /* forUpdate */, check)
if err != nil { ... }
if skip {
  // The new descriptor already reflects what we needed to get done.
  return nil
}

If this fits all existing uses, that would be a nice solution.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried the second option you suggested, how do you feel about it?

@aayushshah15 aayushshah15 force-pushed the 20220404_doNotErrorOnDemotedLearnerRemovalRace branch from 1efaf60 to 03d2b33 Compare April 7, 2022 16:04
Copy link
Contributor Author

@aayushshah15 aayushshah15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @aayushshah15, @nvanbenschoten, and @tbg)


pkg/kv/kvserver/replica_command.go, line 2068 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Is this case still valid? Or does the check in execReplicationChangesForVoters that switches from a internalChangeTypeRemoveLearner to a internalChangeTypeDemoteVoterToLearner make it dead code?

Since VOTER_OUTGOINGs are not a thing anymore, and neither are non-atomic replication changes, this area as a whole could use a proper cleanup. I considered adding a clean-up commit at the end, but there are a few test flakes I'd need to iron out. Do you mind if we do this cleanup separately? I'd also like to backport this, so it's probably better to keep this patch small.


pkg/kv/kvserver/replica_command.go, line 2210 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: could we name this return arg?

Done.


pkg/kv/kvserver/replica_command.go, line 2230 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: break

Done.

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r3, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @aayushshah15 and @tbg)


pkg/kv/kvserver/replica_command.go, line 2068 at r2 (raw file):

Previously, aayushshah15 (Aayush Shah) wrote…

Since VOTER_OUTGOINGs are not a thing anymore, and neither are non-atomic replication changes, this area as a whole could use a proper cleanup. I considered adding a clean-up commit at the end, but there are a few test flakes I'd need to iron out. Do you mind if we do this cleanup separately? I'd also like to backport this, so it's probably better to keep this patch small.

Yes, let's absolutely keep this separate.


pkg/kv/kvserver/replica_command.go, line 2710 at r3 (raw file):

// The supplied `check` method verifies whether the descriptor fetched from kv
// matches expectations and returns a "skip" hint that the caller can use to
// optionally elide work that doesn't need to be done.

We should mention here that if a function returns skip = true, matched should also be true. Maybe even assert that below.

Or we could change this contract to require the opposite (skip = true => matches = false). It feels somewhat strange that we're pretending something matched when it did not. If we could express the condition "did not match, but that's ok", we'd be saying exactly what we mean.

@tbg might have thoughts.


pkg/kv/kvserver/replica_command.go, line 2747 at r3 (raw file):

	existingDescKV, err := get(ctx, descKey)
	if err != nil {
		return nil, nil, skip, errors.Wrap(err, "fetching current range descriptor value")

nit: return false directly so that readers don't need to check whether skip has already been set. Same thing a few lines down.

@aayushshah15 aayushshah15 force-pushed the 20220404_doNotErrorOnDemotedLearnerRemovalRace branch from 03d2b33 to 44052fd Compare April 7, 2022 16:50
Previously, at the end of a replication change, if the `ChangeReplicas` request
found that the (demoted) learner replica it was trying to remove from the range
had already been removed (presumably because it raced with the mergeQueue, the
`StoreRebalancer`, or something else), it would error out. This was
unfortunate, because, for all practical purposes, the replication change _had
succeeded_.

We can now gracefully handle this instead by no-oping if we detect that the
replica we were trying to remove has already been removed.

Release note: None
@aayushshah15 aayushshah15 force-pushed the 20220404_doNotErrorOnDemotedLearnerRemovalRace branch from 44052fd to 8b34184 Compare April 9, 2022 19:16
Copy link
Contributor Author

@aayushshah15 aayushshah15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @aayushshah15, @nvanbenschoten, and @tbg)


pkg/kv/kvserver/replica_command.go, line 2710 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

We should mention here that if a function returns skip = true, matched should also be true. Maybe even assert that below.

Or we could change this contract to require the opposite (skip = true => matches = false). It feels somewhat strange that we're pretending something matched when it did not. If we could express the condition "did not match, but that's ok", we'd be saying exactly what we mean.

@tbg might have thoughts.

I went with the second option you proposed. How do you feel about it? Did you want me to also assert that when skip==true that matched must be false?


pkg/kv/kvserver/replica_command.go, line 2747 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: return false directly so that readers don't need to check whether skip has already been set. Same thing a few lines down.

Done.

@tbg tbg requested a review from nvanbenschoten April 11, 2022 08:15
Copy link
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM to me, letting Nathan put the 👔 on it

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @aayushshah15 and @nvanbenschoten)


pkg/kv/kvserver/replica_command.go, line 2710 at r3 (raw file):

Previously, aayushshah15 (Aayush Shah) wrote…

I went with the second option you proposed. How do you feel about it? Did you want me to also assert that when skip==true that matched must be false?

Looks good to me, thanks for chewing through this.

Copy link
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 2 files at r1, 4 of 5 files at r2, 1 of 1 files at r4, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @aayushshah15 and @nvanbenschoten)

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 1 files at r4, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @aayushshah15)

@aayushshah15
Copy link
Contributor Author

TFTRs

bors r+

@aayushshah15
Copy link
Contributor Author

I think we could reasonably wait to backport this until the first or second patch release since the race this PR fixes isn't nearly as likely as #79379.

@craig
Copy link
Contributor

craig bot commented Apr 12, 2022

Build succeeded:

@craig craig bot merged commit 0130133 into cockroachdb:master Apr 12, 2022
@aayushshah15 aayushshah15 deleted the 20220404_doNotErrorOnDemotedLearnerRemovalRace branch April 12, 2022 17:13
aayushshah15 added a commit to aayushshah15/cockroach that referenced this pull request Apr 20, 2022
…locateRange`

Previously, after cockroachdb#79405,
`maybeLeaveAtomicChangeReplicasAndRemoveLearners()` could return a range
descriptor that contained learner replicas. This was because of the idempotency
introduced in cockroachdb#79405, which relaxed how/when we failed replication changes due
to descriptor mismatch. The hazard was that, while we would ensure that the
learner we were trying to remove was indeed gone, we wouldn't do enough to
ensure that there weren't new learners that were added to the range due to
other concurrent `AdminRelocateRange` calls.

This was a violation of its contract and could sometimes fail
`AdminRelocateRange` requests.

This commit makes it such that when `maybeLeaveAtomicChangeReplicasAndRemoveLearners`
 returns a non-error response, it guarantees that _all_ the learner replicas
have been removed according to the returned range descriptor.

Release note: None
aayushshah15 added a commit to aayushshah15/cockroach that referenced this pull request Apr 21, 2022
This commit is an alternative to cockroachdb#80205.

In cockroachdb#79405, we relaxed the check
that `AdminChangeReplicas` performs comparing the expected range descriptor
state against the range descriptor in KV when we were performing single learner
removals. This was because these learner removals are performed at the end of a
replication change (i.e. after its already completed) and thus, could be made
idempotent.

The above change messed with a subtle part of the contract of
`maybeLeaveAtomicChangeReplicasAndRemoveLearners()` since now it was no longer
guaranteed to always successfully return with a range desc that did not contain
learners (due to intervening replication changes that were now no longer
detected because of the relaxation mentioned above).

This commit fixes this regression by making
`maybeLeaveAtomicChangeReplicasAndRemoveLearners` check at the end if we've
raced with another replication change. If so, we return an error instead of
incorrectly returning a successful response with a descriptor that could still
have learners or be in a joint config.

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants