kvserver: merging of ranges with non-voters can be extremely disruptive to foreground traffic #63199

aayushshah15 · 2021-04-07T03:56:21Z

Non-voting replicas are not upreplicated through a synchronous snapshot the way learners are. We queue up new non-voters into the raft snapshot queue and rely on it to initialize newly added / rebalanced non-voters.

A range merge will try to collocate replica sets for the LHS and RHS, which may require rebalancing one or more non-voting replicas from the right hand side range to stores that have replicas for the left hand side range. An AdminMerge will then send a SubsumeRequest to the RHS and wait until all replicas have caught up to the LAI of the subsume.

At this point, non-voting replicas that were just rebalanced for the sake of the merge are very likely to still be waiting for, or be in the process of receiving, their initial snapshot from the raft snapshot queue (once the snapshot queue starts sending these newly rebalanced non-voters their initial snapshots, it will take roughly ~64 seconds per replica if we assume default settings for range size and snapshot rates)

So, the AdminMerge will hit this 5-second timeout and the merge will fail. This will prompt the merge queue to log an error and try again after about 10 minutes. Now, notice that we’re waiting for those 5 seconds after we’ve sent the subsume request. This means that RHS will not serve any traffic for those 5 seconds.

The expected high-level consequence of all this interaction is that all foreground traffic on a range that has non-voting replicas could experience seemingly random 5 second blips, where all requests to such a range are blocked for those 5 seconds.

The text was updated successfully, but these errors were encountered:

blathers-crl · 2021-04-07T03:56:38Z

Hi @aayushshah15, please add branch-* labels to identify which branch(es) this release-blocker affects.

_{🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.}

nvanbenschoten · 2021-04-07T05:35:01Z

Thanks for writing this issue up. I agree with the GA-blocker label. Were you able to construct this scenario in a unit test like you were discussing today?

aayushshah15 · 2021-04-07T11:57:19Z

Were you able to construct this scenario in a unit test like you were discussing today?

Yup, our theory here checks out. In fact, there's a much simpler way to simulate the failure -- just add a 5 second sleep in this testing knob. This made range-merges deterministically fail for all non-collocated pairs of LHS/RHS that have non-voters. As expected, range-merges for ranges that don't have non-voters are not affected by this delay.

Resolves cockroachdb#63199 Before this commit, we relied on the raft snapshot queue to asynchronously perform the initial upreplication of non-voting replicas. This meant that by the time `AdminChangeReplicas` (and consequently, `AdminRelocateRange`) returned to its client, non-voters were not guaranteed to have been initialized. This was a deliberate decision and was, thus far, believed to be copacetic. However, this decision subtly made range merges (of ranges that have any number of non-voters) extremely unlikely to suceed, while causing severe disruption on foreground traffic on the right hand side of a merge. This was because the `mergeQueue` will first call `AdminRelocateRange` on the right hand side range in order to collocate its replicas with the replicas of the left hand side range. If the `mergeQueue` happened to relocate any non-voting replicas, they were likely to still be waiting for their initial snapshot by the time the `AdminMerge` attempted to subsume the RHS. Essentially, this meant that we were subsuming the RHS of a merge while some of its replicas weren't even initialized. This would cause the merge to fail and, in the interim, block all traffic over the RHS range for a 5 second window. This commit fixes the unfortunate sequence of events described above by making the behavior of `AdminChangeReplicas` more symmetric for voting and non-voting replicas. Now, if `AdminChangeReplicas` successfully returns, its client can safely assume that all new replicas have at least been upreplicated via an initial snapshot. Release note: None

62827: sql: disallow dropping enum values used in default/computed columns r=ajwerner a=the-ericwang35 Previously, users could drop enum values being used in default expressions or computed columns, since we did not perform any checks beforehand. This meant that default expressions and computed columns could become corrupted after making such a drop. This patch addresses this by walking default expressions and computed columns when an enum member is dropped, and disallows the drop if it finds any usages. Partially fixes #59807 (this PR addresses the issue in default/computed columns, and #62736 addresses the issue in views). Release note: None 63215: kvserver: perform initial upreplication of non-voters synchronously r=aayushshah15 a=aayushshah15 Resolves #63199 Before this commit, we relied on the raft snapshot queue to asynchronously perform the initial upreplication of non-voting replicas. This meant that by the time `AdminChangeReplicas` (and consequently, `AdminRelocateRange`) returned to its client, non-voters were not guaranteed to have been initialized. This was a deliberate decision and was, thus far, believed to be copacetic. However, this decision subtly made range merges (of ranges that have any number of non-voters) extremely unlikely to suceed, while causing severe disruption on foreground traffic on the right hand side of a merge. This was because the `mergeQueue` will first call `AdminRelocateRange` on the right hand side range in order to collocate its replicas with the replicas of the left hand side range. If the `mergeQueue` happened to relocate any non-voting replicas, they were likely to still be waiting for their initial snapshot by the time the `AdminMerge` attempted to subsume the RHS. Essentially, this meant that we were subsuming the RHS of a merge while some of its replicas weren't even initialized. This would cause the merge to fail and, in the interim, block all traffic over the RHS range for a 5 second window. This commit fixes the unfortunate sequence of events described above by making the behavior of `AdminChangeReplicas` more symmetric for voting and non-voting replicas. Now, if `AdminChangeReplicas` successfully returns, its client can safely assume that all new replicas have at least been upreplicated via an initial snapshot. Release note: None 63388: bazel: bump timeout of `ring_test` r=rickystewart a=rickystewart This has timed out in CI. Release note: None Co-authored-by: Eric Wang <[email protected]> Co-authored-by: Aayush Shah <[email protected]> Co-authored-by: Ricky Stewart <[email protected]>

aayushshah15 added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Apr 7, 2021

aayushshah15 self-assigned this Apr 7, 2021

aayushshah15 added the GA-blocker label Apr 7, 2021

aayushshah15 added the branch-release-21.1 label Apr 7, 2021

aayushshah15 mentioned this issue Apr 7, 2021

kvserver: perform initial upreplication of non-voters synchronously #63215

Merged

aayushshah15 mentioned this issue Apr 9, 2021

release-21.1: kvserver: perform initial upreplication of non-voters synchronously #63396

Merged

craig bot closed this as completed in c52b01f Apr 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kvserver: merging of ranges with non-voters can be extremely disruptive to foreground traffic #63199

kvserver: merging of ranges with non-voters can be extremely disruptive to foreground traffic #63199

aayushshah15 commented Apr 7, 2021 •

edited

Loading

blathers-crl bot commented Apr 7, 2021

nvanbenschoten commented Apr 7, 2021

aayushshah15 commented Apr 7, 2021

kvserver: merging of ranges with non-voters can be extremely disruptive to foreground traffic #63199

kvserver: merging of ranges with non-voters can be extremely disruptive to foreground traffic #63199

Comments

aayushshah15 commented Apr 7, 2021 • edited Loading

blathers-crl bot commented Apr 7, 2021

nvanbenschoten commented Apr 7, 2021

aayushshah15 commented Apr 7, 2021

aayushshah15 commented Apr 7, 2021 •

edited

Loading