-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kv: dist sender can get into a bad loop if replicas are rebalanced #15543
Comments
@spencerkimball This looks exactly like what you're concerned about. |
There is an unusual amount of rebalancing activity for It is unfortunate that the log messages do not include the node ID that reported the |
`RangeNotFoundError` was previously being treated the same as a node or store being temporarily or permanently unavailable. We now treat it the same as `RangeKeyMismatchError`, exiting immediately with the assumption that the `RangeDescriptor` used to collate a slice of replicas as distributed send targets is stale and must be re-queried. Further, we now deduce that the same is true in the event that we get a `NotLeaseHolderError` that implicates a replica which is not present in the replicas slice. Fixes #15543
`RangeNotFoundError` was previously being treated the same as a node or store being temporarily or permanently unavailable. We now treat it the same as `RangeKeyMismatchError`, exiting immediately with the assumption that the `RangeDescriptor` used to collate a slice of replicas as distributed send targets is stale and must be re-queried. Further, we now deduce that the same is true in the event that we get a `NotLeaseHolderError` that implicates a replica which is not present in the replicas slice. Fixes #15543
In the case where the dist sender gets an RPC "black holed" to a replica which is then replaced, the dist sender will never get an updated set of replicas which includes the newly rebalanced replica, and will retry forever on
NotLeaseHolderErrors
.The text was updated successfully, but these errors were encountered: