Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: lease transferred to uninitialized follower #61604

Closed
tbg opened this issue Mar 8, 2021 · 4 comments · Fixed by #69696
Closed

kvserver: lease transferred to uninitialized follower #61604

tbg opened this issue Mar 8, 2021 · 4 comments · Fixed by #69696
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-kv KV Team

Comments

@tbg
Copy link
Member

tbg commented Mar 8, 2021

Describe the problem

In #61541 (comment), we saw that a range with an uninitialized follower transferred the lease to that uninitialized follower, leading to the range becoming unavailable until the follower caught up. This indicates a hole in our safety mechanisms.

To Reproduce

Unclear, but the test in #61541 definitely achieved it.

Expected behavior
A clear and concise description of what you expected to happen.

Additional data / screenshots

Environment:

https://github.com/cockroachdb/cockroach/commits/b703e663da8ededaee2e28fc39a24e3880ae54cf

Additional context

It looks like aggressive changes in range size can lead to temporary (but substantial) unavailability.

gz#9425

@tbg tbg added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Mar 8, 2021
@tbg
Copy link
Member Author

tbg commented Mar 8, 2021

@nvanbenschoten are you aware of anything having changed around lease transfers? We need to know whether to classify #61541 as a release blocker.

@nvanbenschoten
Copy link
Member

A decent amount has changed around lease transfers, but I don't think it would have affected this.

As @aayushshah15 pointed out in #61541:

we don’t seem to do anything to prevent lease transfers to VOTER_FULL replicas that are waiting for a snapshot

One thing to point out is that we recently made splits/largerange/size=32GiB,nodes=6 more aggressive. We bumped the range size up from 10GB to 32GB and bumped the cluster size up from 3 nodes to 6 nodes. Aayush is aware of this and is looking to determine whether either of those changes is the cause of the newer failures. We'll want to make the test less aggressive again if we determine that there's nothing immediately actionable in how we respond to this much splitting/rebalancing.

@aayushshah15
Copy link
Contributor

@andreimatei and I were quibbling about this and it seems like a worthwhile safeguard we can add on a leader is just to reject TransferLeaseRequests to followers that its progress-tracker knows are in StateSnapshot (though I'm not sure at what "level" we want to reject such a thing, maybe during the command evaluation?).

@aayushshah15
Copy link
Contributor

Looks like we had a much more aggressive version of this sort of safeguard that we removed entirely in this patch: #42379

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-kv KV Team
Projects
None yet
4 participants