forked from cockroachdb/cockroach
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
In tpccbench/nodes=9/cpu=4/multi-region, I consistently see the RESTORE fail a split because the retry loop in executeAdminCommandWithDescriptor interleaves with replicate queue activity until the retries are exhausted. This is likely since due to the geo-replicated nature of the cluster, the many steps involved in replicating the range take much longer than we're used to; in the logs I can see the split attempt and an interleaving replication change taking turns until the split gives up. The solution is to just retry the split forever as long as we're only seeing errors we know should eventually resolve. The retry limit was introduced to fix cockroachdb#23310 but there is lots of FUD around it. It's very likely that the problem encountered there was really fixed in cockroachdb#23762, which added a lease check between the split attempts. Touches cockroachdb#41028. Release justification: fixes roachtest failure, is low risk Release note: None
- Loading branch information
Showing
2 changed files
with
7 additions
and
76 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters