-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvserver: allow voter demoting to get a lease, in case there's an incoming voter #83686
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for getting this fix for #83687 out so quickly! This looks good.
It will be interesting to write the test that reproduces what we saw in that issue and demonstrates that this resolves the bug. Also, we'll want to update TestCheckCanReceiveLease
.
c705905
to
723a8a8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 3 files at r1, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani and @shralex)
pkg/roachpb/metadata_replicas.go
line 548 at r1 (raw file):
if !ok { return errReplicaNotFound } else if !(repDesc.IsVoterNewConfig() || (repDesc.IsAnyVoter() && replDescs.
We will need to plumb a cluster setting check in here, like we have in Replica.propose
. Clusters that don't contain all 22.1 nodes shouldn't allow a voter demoting to receive the lease.
pkg/roachpb/metadata_replicas.go
line 549 at r1 (raw file):
return errReplicaNotFound } else if !(repDesc.IsVoterNewConfig() || (repDesc.IsAnyVoter() && replDescs. containsVoterIncoming())) {
Do you have ideas about unifying this logic and the logic in Replica.propose
? Doing so would 1) ensure that they're coherent, and 2) give us a good place to justify the subtle condition and link it back to the config change protocol.
Is doing so as simple as calling
if !CheckCanReceiveLease(r.ReplicaID(), p.command.ReplicatedEvalResult.State.Desc.Replica) { ... }
in Replica.propose
?
pkg/kv/kvserver/client_lease_test.go
line 278 at r1 (raw file):
to VOTER_OUTGOING
Do you mean VOTER_DEMOTING_LEARNER
? I didn't think we still used VOTER_OUTGOING
anywhere.
pkg/kv/kvserver/client_lease_test.go
line 280 at r1 (raw file):
// command to move n3 to VOTER_OUTGOING has been evaluated, we'll send // the request to transfer the lease to n3. The hope is that it will // get past the sanity above latch acquisition prior to change replicas
"sanity check"
"prior to the change replicas command committing"
pkg/kv/kvserver/client_lease_test.go
line 282 at r1 (raw file):
// get past the sanity above latch acquisition prior to change replicas // command committing. var scratchRangeID atomic.Value
nit: consider an atomic.Int64
instead for the convenient zero value.
pkg/kv/kvserver/client_lease_test.go
line 300 at r1 (raw file):
knobs.Store.(*kvserver.StoreTestingKnobs).TestingProposalFilter = func(args kvserverbase.ProposalFilterArgs) *roachpb.Error { blockIfShould(args) return nil
Were you still planning to add a second test? This one is good, but it's not quite the same scenario as what we saw in #83687. In that failure mode, the proposal for a lease transfer failed, which required a lease request to make it through CheckCanReceiveLease
. So when writing a test to demonstrate that, we'll want the TestingProposalFilter
to return an error for the LeaseTransfer performed by the config change, not just block. We'll then want to make sure that the lease is re-acquired and the config change is eventually completed.
pkg/kv/kvserver/client_lease_test.go
line 319 at r1 (raw file):
// // - Send an AdminChangeReplicasRequest to remove n3 and add n4 // - Block the step that moves n3 to VOTER_OUTGOING on changeReplicasChan
Same question about VOTER_OUTGOING
.
pkg/kv/kvserver/batcheval/cmd_lease_test.go
line 278 at r1 (raw file):
{leaseholderType: roachpb.VOTER_INCOMING, anotherReplicaType: none, eligible: true}, // A VOTER_OUTGOING should only be able to get the lease if there's a VOTER_INCOMING
nit: punctuation.
723a8a8
to
e5b9ffa
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani, @nvanbenschoten, and @shralex)
pkg/roachpb/metadata_replicas.go
line 548 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
We will need to plumb a cluster setting check in here, like we have in
Replica.propose
. Clusters that don't contain all 22.1 nodes shouldn't allow a voter demoting to receive the lease.
Done
pkg/roachpb/metadata_replicas.go
line 549 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Do you have ideas about unifying this logic and the logic in
Replica.propose
? Doing so would 1) ensure that they're coherent, and 2) give us a good place to justify the subtle condition and link it back to the config change protocol.Is doing so as simple as calling
if !CheckCanReceiveLease(r.ReplicaID(), p.command.ReplicatedEvalResult.State.Desc.Replica) { ... }in
Replica.propose
?
Done
pkg/kv/kvserver/client_lease_test.go
line 278 at r1 (raw file):
VOTER_DEMOTING_LEARNER
Done
pkg/kv/kvserver/client_lease_test.go
line 280 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
"sanity check"
"prior to the change replicas command committing"
Done
Code quote:
get past the sanity above latch acquisition prior to change replicas
pkg/kv/kvserver/client_lease_test.go
line 319 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Same question about
VOTER_OUTGOING
.
Done
pkg/kv/kvserver/batcheval/cmd_lease_test.go
line 278 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
nit: punctuation.
Done
Code quote:
A VOTER_OUTGOING should only be able to
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 11 of 11 files at r2, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani and @shralex)
pkg/kv/kvclient/kvcoord/replica_slice.go
line 84 at r2 (raw file):
} canReceiveLease := func(rDesc roachpb.ReplicaDescriptor) bool { if err := roachpb.CheckCanReceiveLease(rDesc, desc.Replicas(), false); err != nil {
If we can't plumb a cluster setting in here, then I think we should pass true /* leaseHolderRemovalAllowed */
here so that we optimistically assume that a VOTER_DEMOTING can be a leaseholder. This function is called with filter = OnlyPotentialLeaseholders
to determine the maximal set of replicas that a request may need to be routed to in order to find the leaseholder.
pkg/kv/kvserver/replica_proposal_buf.go
line 15 at r2 (raw file):
import ( "context" "github.com/cockroachdb/cockroach/pkg/clusterversion"
Have you configured crlfmt
: https://cockroachlabs.atlassian.net/wiki/spaces/ENG/pages/154206209/Goland+Tips+and+Tricks#Enable-crlfmt-Watcher?
crlfmt
would move this line down to the block below.
pkg/kv/kvserver/replica_raft.go
line 379 at r2 (raw file):
// kind of voter in the next new config (potentially VOTER_DEMOTING). proposedDesc := p.command.ReplicatedEvalResult.State.Desc if err := roachpb.CheckCanReceiveLease(lhDescriptor, proposedDesc.Replicas(),
nit: this indentation looks foreign to this codebase and I think it might also be against our style guide. Consider:
if err := roachpb.CheckCanReceiveLease(
lhDescriptor, proposedDesc.Replicas(), lhRemovalAllowed,
); err != nil {
...
}
Or pull proposedDesc.Replicas()
into a variable and use a single line.
pkg/kv/kvserver/replica_raft.go
line 379 at r2 (raw file):
// kind of voter in the next new config (potentially VOTER_DEMOTING). proposedDesc := p.command.ReplicatedEvalResult.State.Desc if err := roachpb.CheckCanReceiveLease(lhDescriptor, proposedDesc.Replicas(),
We should move some of this explanation to CheckCanReceiveLease
, but then explain why we're calling that function in the first place.
pkg/kv/kvserver/batcheval/cmd_lease_request.go
line 15 at r2 (raw file):
import ( "context" "github.com/cockroachdb/cockroach/pkg/clusterversion"
Same thing here.
pkg/kv/kvserver/batcheval/cmd_lease_request.go
line 75 at r2 (raw file):
// If this check is removed at some point, the filtering of learners on the // sending side would have to be removed as well. if err := roachpb.CheckCanReceiveLease(args.Lease.Replica, cArgs.EvalCtx.Desc().Replicas(),
nit: same point about indentation for lines that look like this (at least here and in cmd_lease_transfer.go
)
pkg/roachpb/metadata_replicas.go
line 549 at r1 (raw file):
See further info in replica_raft Propose().
Now that we're calling this method from replica.Propose
, I think we should move the explanation here.
pkg/kv/kvserver/batcheval/cmd_lease_test.go
line 314 at r2 (raw file):
rngDesc.InternalReplicas = append(rngDesc.InternalReplicas, anotherDesc) } err := roachpb.CheckCanReceiveLease(rngDesc.InternalReplicas[0], rngDesc.Replicas(), true)
Should we extend this test to exercise the true and false case for leaseHolderRemovalAllowed
?
e5b9ffa
to
ce195e7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 7 of 7 files at r3, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani and @shralex)
pkg/kv/kvclient/kvcoord/replica_slice.go
line 84 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
If we can't plumb a cluster setting in here, then I think we should pass
true /* leaseHolderRemovalAllowed */
here so that we optimistically assume that a VOTER_DEMOTING can be a leaseholder. This function is called withfilter = OnlyPotentialLeaseholders
to determine the maximal set of replicas that a request may need to be routed to in order to find the leaseholder.
With the /* leaseHolderRemovalAllowed */
comment. That's in the CRDB style guide for literal value params.
pkg/kv/kvserver/replica_raft.go
line 366 at r3 (raw file):
// we move to a joint config where v1 (VOTER_DEMOTING_LEARNER) transfer the // lease to v4 (VOTER_INCOMING) directly. // See also https://github.com/cockroachdb/cockroach/issues/67740.
I don't want to hold up this PR any further, but I think we should improve this comment to explain why we don't allow VOTER_DEMOTING to hold the lease if there is not a VOTER_INCOMING. Do you mind opening an issue?
pkg/kv/kvserver/client_lease_test.go
line 300 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Were you still planning to add a second test? This one is good, but it's not quite the same scenario as what we saw in #83687. In that failure mode, the proposal for a lease transfer failed, which required a lease request to make it through
CheckCanReceiveLease
. So when writing a test to demonstrate that, we'll want theTestingProposalFilter
to return an error for the LeaseTransfer performed by the config change, not just block. We'll then want to make sure that the lease is re-acquired and the config change is eventually completed.
Were you still planning to add this test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani, @nvanbenschoten, and @shralex)
pkg/kv/kvclient/kvcoord/replica_slice.go
line 84 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
If we can't plumb a cluster setting in here, then I think we should pass
true /* leaseHolderRemovalAllowed */
here so that we optimistically assume that a VOTER_DEMOTING can be a leaseholder. This function is called withfilter = OnlyPotentialLeaseholders
to determine the maximal set of replicas that a request may need to be routed to in order to find the leaseholder.
Done
pkg/kv/kvserver/replica_proposal_buf.go
line 15 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Have you configured
crlfmt
: https://cockroachlabs.atlassian.net/wiki/spaces/ENG/pages/154206209/Goland+Tips+and+Tricks#Enable-crlfmt-Watcher?
crlfmt
would move this line down to the block below.
Thanks! I did this multiple times, but it keeps disappearing, perhaps when GoLand updates...
pkg/kv/kvserver/replica_raft.go
line 379 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
nit: this indentation looks foreign to this codebase and I think it might also be against our style guide. Consider:
if err := roachpb.CheckCanReceiveLease( lhDescriptor, proposedDesc.Replicas(), lhRemovalAllowed, ); err != nil { ... }Or pull
proposedDesc.Replicas()
into a variable and use a single line.
Done
pkg/kv/kvserver/replica_raft.go
line 379 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
We should move some of this explanation to
CheckCanReceiveLease
, but then explain why we're calling that function in the first place.
Done
pkg/kv/kvserver/batcheval/cmd_lease_request.go
line 15 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Same thing here.
Done
pkg/kv/kvserver/batcheval/cmd_lease_request.go
line 75 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
nit: same point about indentation for lines that look like this (at least here and in
cmd_lease_transfer.go
)
Done
Code quote:
clusterversion.EnableLeaseHolderRemoval)
pkg/roachpb/metadata_replicas.go
line 549 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
See further info in replica_raft Propose().
Now that we're calling this method from
replica.Propose
, I think we should move the explanation here.
Done
pkg/kv/kvserver/client_lease_test.go
line 300 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Were you still planning to add this test?
Yep
pkg/kv/kvserver/batcheval/cmd_lease_test.go
line 314 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Should we extend this test to exercise the true and false case for
leaseHolderRemovalAllowed
?
Done
ce195e7
to
eb1aa71
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 11 files at r2, 5 of 7 files at r3, 2 of 3 files at r4, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten and @shralex)
-- commits
line 7 at r4:
Might be worth adding some words about the PROSCRIBED
status which prevents other nodes from being able to get the lease.
-- commits
line 10 at r4:
nit: this can be removed now. Might also be worth linking the issue this PR resolves in the commit message itself.
pkg/kv/kvserver/replica_proposal_buf.go
line 154 at r4 (raw file):
withGroupLocked(func(proposerRaft) error) error registerProposalLocked(*ProposalData) leaderStatusRLocked(ctx context.Context, raftGroup proposerRaft) rangeLeaderInfo
testProposer
in replica_proposal_buf_test.go seems in need of an update.
pkg/kv/kvserver/replica_raft.go
line 387 at r4 (raw file):
// leaseholder to be a VOTER_DEMOTING as long as there is a VOTER_INCOMING. // Otherwise, the leaseholder must be a full voter in the target config. // When the leaseholder is being removed, this check won't allow exiting
nit: this comment (the stuff after "When...") feels a bit out of place. Might be worth expanding a bit more or dropping it entirely.
pkg/roachpb/metadata_replicas.go
line 548 at r1 (raw file):
Previously, shralex wrote…
Done
I'm not very familiar with this stuff, so I might be missing something, but is it possible for us to get into a joint configuration which includes a VOTER_DEMOTING
and VOTER_INCOMING
if all nodes aren't running 22.1 (and have been finalized to the appropriate version)?
pkg/roachpb/metadata_replicas.go
line 551 at r4 (raw file):
// An error is also returned is the replica is not part of `replDescs`. func CheckCanReceiveLease( wouldbeLeaseholder ReplicaDescriptor, replDescs ReplicaSet, leaseHolderRemovalAllowed bool,
Could we add a comment that says leaseHolderRemovalAllowed
is intended to check if the cluster version is EnableLeaseHolderRemoval
or higher? It might also be worth adding a TODO to clean this up in 22.2.
pkg/roachpb/metadata_replicas.go
line 558 at r4 (raw file):
} if !(repDesc.IsVoterNewConfig() || (leaseHolderRemovalAllowed && repDesc.IsAnyVoter() && replDescs.containsVoterIncoming())) {
Would this conditional be clearer if instead of repDesc.IsAnyVoter
, we instead called into a different function (say) repDesc.IsAnyDemotingVoter
? That would help avoid the redundancy from the previous call and make the joint config inference here a bit clearer.
pkg/kv/kvserver/client_lease_test.go
line 270 at r4 (raw file):
// requests for nodes which are already in the VOTER_DEMOTING_LEARNER state succeeds // when there is a VOTER_INCOMING node. func TestTransferLeaseToVoterDemotinggWithIncoming(t *testing.T) {
nit: "Demotingg"
pkg/kv/kvserver/batcheval/cmd_lease_test.go
line 278 at r1 (raw file):
Previously, shralex wrote…
Done
Do we need to be accounting for these VOTER_OUTGOING
cases (in tests and in code) given we no longer use it?
eb1aa71
to
55dfa3e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani, @nvanbenschoten, and @shralex)
pkg/kv/kvserver/replica_proposal_buf.go
line 154 at r4 (raw file):
replica_proposal_buf_test.go
Thanks
pkg/kv/kvserver/replica_raft.go
line 366 at r3 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
I don't want to hold up this PR any further, but I think we should improve this comment to explain why we don't allow VOTER_DEMOTING to hold the lease if there is not a VOTER_INCOMING. Do you mind opening an issue?
Done
Code quote:
See also https://github.com/cockroachdb/cockroach/issues/67740.
pkg/kv/kvserver/replica_raft.go
line 387 at r4 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
nit: this comment (the stuff after "When...") feels a bit out of place. Might be worth expanding a bit more or dropping it entirely.
I can understand why its confusing - it refers to a check made inside the function. I tried to clarify.
pkg/roachpb/metadata_replicas.go
line 548 at r1 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
I'm not very familiar with this stuff, so I might be missing something, but is it possible for us to get into a joint configuration which includes a
VOTER_DEMOTING
andVOTER_INCOMING
if all nodes aren't running 22.1 (and have been finalized to the appropriate version)?
It is still possible in previous releases, but only from 22.1 the VOTER_DEMOTING could be the leaseholder
pkg/roachpb/metadata_replicas.go
line 551 at r4 (raw file):
leaseHolderRemovalAllowed is intended to check if the cluster version is EnableLeaseHolderRemoval or higher? It might also be worth adding a TODO to clean this up in 22.2.
Done
pkg/roachpb/metadata_replicas.go
line 558 at r4 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
Would this conditional be clearer if instead of
repDesc.IsAnyVoter
, we instead called into a different function (say)repDesc.IsAnyDemotingVoter
? That would help avoid the redundancy from the previous call and make the joint config inference here a bit clearer.
I changed it to hopefully be more readable.
pkg/kv/kvserver/client_lease_test.go
line 282 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
nit: consider an
atomic.Int64
instead for the convenient zero value.
Done
pkg/kv/kvserver/client_lease_test.go
line 300 at r1 (raw file):
Previously, shralex wrote…
Yep
Added
pkg/kv/kvserver/client_lease_test.go
line 270 at r4 (raw file):
Demotingg
Thanks
pkg/kv/kvserver/batcheval/cmd_lease_test.go
line 278 at r1 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
Do we need to be accounting for these
VOTER_OUTGOING
cases (in tests and in code) given we no longer use it?
Discussed with Nathan offline, yes for now, given that we do assume this everywhere else for now.
55dfa3e
to
dd22457
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think everything is here, so we'll just need to polish this off before merging.
Reviewed 1 of 3 files at r4, 4 of 4 files at r5, 1 of 1 files at r6, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani and @shralex)
-- commits
line 2 at r6:
You updated the PR description but not the commit message. Let's copy that updated text over here.
pkg/kv/kvserver/batcheval/cmd_lease_request.go
line 75 at r2 (raw file):
Previously, shralex wrote…
Done
nit: this line did not get updated.
pkg/kv/kvserver/client_lease_test.go
line 371 at r6 (raw file):
// https://github.com/cockroachdb/cockroach/issues/83687 // and makes sure that if lease transfer fails during a joint configuration // the previous leaseholder will successfully re-aquire the lease.
Could you add a sentence or two about the flow of the test to this comment? What does it do and why does this prove that the bug is fixed. The explanation should mention the revoked lease that needs to be reacquired.
pkg/kv/kvserver/client_lease_test.go
line 378 at r6 (raw file):
knobs := base.TestingKnobs{Store: &kvserver.StoreTestingKnobs{}} // Add a testing knob to allow us to fall a lease transfer command
s/fall a lease transfer command while it is/fail all lease transfer commands on this range when they are/
pkg/kv/kvserver/client_lease_test.go
line 416 at r6 (raw file):
require.Regexp(t, "Injecting lease transfer failure", err) desc, err = tc.LookupRange(scratchStartKey)
Add a comment about the state that we are in here.
pkg/kv/kvserver/client_lease_test.go
line 420 at r6 (raw file):
require.True(t, desc.Replicas().InAtomicReplicationChange()) // Further lease transfers should succeed.
, allowing the atomic replication change to complete.
dd22457
to
496e2a0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani and @nvanbenschoten)
Previously, arulajmani (Arul Ajmani) wrote…
Might be worth adding some words about the
PROSCRIBED
status which prevents other nodes from being able to get the lease.
Done
Previously, arulajmani (Arul Ajmani) wrote…
nit: this can be removed now. Might also be worth linking the issue this PR resolves in the commit message itself.
Done
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
You updated the PR description but not the commit message. Let's copy that updated text over here.
Done
pkg/kv/kvserver/batcheval/cmd_lease_request.go
line 75 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
nit: this line did not get updated.
Done
pkg/kv/kvserver/client_lease_test.go
line 371 at r6 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Could you add a sentence or two about the flow of the test to this comment? What does it do and why does this prove that the bug is fixed. The explanation should mention the revoked lease that needs to be reacquired.
Done
Code quote:
base.TestingKnobs{Store: &kvserver.StoreTestingKnobs{}}
pkg/kv/kvserver/client_lease_test.go
line 378 at r6 (raw file):
fail all lease transfer commands on this range when they are
Done
pkg/kv/kvserver/client_lease_test.go
line 416 at r6 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Add a comment about the state that we are in here.
Done
pkg/kv/kvserver/client_lease_test.go
line 420 at r6 (raw file):
, allowing the atomic replication change to complete.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r7, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @arulajmani and @shralex)
pkg/kv/kvserver/client_lease_test.go
line 379 at r7 (raw file):
// is attempted to move the lease from n1 to n4 before exiting the joint config, // but that fails, causing us to remain in the joint configuration with the original // leaseholder believing the lease was given away, but everyone else thinking it's still
"believing the lease was given away"
This isn't quite correct. The original leaseholder knows that the lease was not given away, but it has already revoked its lease so it needs to re-acquire a new version of the lease.
Consider just saying "the original leaseholder having revoked its lease, but everyone else ..."
pkg/kv/kvserver/client_lease_test.go
line 385 at r7 (raw file):
// VOTER_DEMOTING_LEARNER (n1) replica to get the lease if there's also a VOTER_INCOMING // which is the case here (n4). // - n1 transfers the lease away and the range leaves the joint configuration
nit: punctuation.
pkg/kv/kvserver/client_lease_test.go
line 430 at r7 (raw file):
require.Regexp(t, "Injecting lease transfer failure", err) // We're now in a joint configuration, n1 thinks the lease was given away
Same point about "thinks the lease was given away".
…oming voter Fixes cockroachdb#83687. It is possible that we've entered a joint config where the leaseholder is being replaced with an incoming voter. We try to transfer the lease away, but this fails. In this case, we need the demoted voter to re-aquire the lease, as it might be an epoch based lease, that doesn't expire. Otherwise we might be stuck without anyone else getting the lease (the original leaseholder will be in the PROSCRIBED state, repeatedly trying to re-aquire the lease, but will fail since its a VOTER_DEMOTING_LEARNER). Release note (bug fix): Fixes a critical bug (cockroachdb#83687) introduced in 22.1.0 where failure to transfer a lease in the joint config may result in range unavailability. The fix allows the original leaseholder to re-aquire the lease so that lease transfer can be retried.
496e2a0
to
139dc42
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r8, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @arulajmani)
bors r+ |
Build succeeded: |
Fixes #83687.
It is possible that we've entered a joint config where the leaseholder is being
replaced with an incoming voter. We try to transfer the lease away, but this fails.
In this case, we need the demoted voter to re-aquire the lease, as it might be an epoch
based lease, that doesn't expire. Otherwise we might be stuck without anyone else getting
the lease (the original leaseholder will be in the PROSCRIBED state, repeatedly trying to
re-aquire the lease, but will fail since its a VOTER_DEMOTING_LEARNER).
Release note (bug fix): Fixes a critical bug (#83687) introduced in 22.1.0 where
failure to transfer a lease in the joint config may result in range unavailability. The fix
allows the original leaseholder to re-aquire the lease so that lease transfer can be retried.