-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvserver: lease transferred to follower waiting for split #79385
Comments
Probably the store rebalancer; I don't think it logs individual transfers
|
This is related to #81561. I don't think such a lease transfer would be allowed if our protection against lease transfers to replicas that may need a snapshot was airtight. The replica here would be classified as potentially needing a snapshot because it has not yet performed its split and so it could not be in |
Describe the problem
In an experiment, a ten node cluster was run with IO overload on n3, and a 2TB bank import. This caused many of the splits involving a follower on n3 to get into a state where the n3 replica was "uninitialized" (because the split trigger would be wildly delayed.
We were seeing evidence that sometimes, the lease would get transferred to the n3 replica, creating an outage. SSTs would then get stuck in NotleaseholderError loops and bounce around for hours.
The full internal thread is here
To Reproduce
Set up 10 node AWS cluster via roachprod according to steps in https://cockroachlabs.slack.com/archives/C0KB9Q03D/p1649015732041819 and deploy the following unit to n3:
Ran this on
71e32a6
Expected behavior
We don't transfer leases to replicas that have never been initialized.
Environment:
master on roachprod AWS
Jira issue: CRDB-14780
Epic CRDB-16160
The text was updated successfully, but these errors were encountered: