storage: check leaseholder status in AdminSplit retry loop #23762

nvanbenschoten · 2018-03-12T21:59:57Z

This change adds a call to redirectOnOrAcquireLease to the AdminSplit
retry loop. This ensures that each attempt to perform a split is done
by a leaseholder. Without this check, it was possible that a stale
follower or a replica that had already been removed from the range
could end up in an infinite retry loop. This was possible because
these replicas could have stale knowledge about the state of the
range descriptor. The effect of this was that the replicas' updates
to the local range descriptor key could indefinitely hit
ConditionFailedErrors, which are retried.

The change also adds a regression test that creates this
"replica removed while splitting" scenario. Without the leaseholder
check introduced here, the test would fail.

Release note: None

cockroach-teamcity · 2018-03-12T22:00:05Z

This change is

benesch · 2018-03-12T22:15:45Z

Now that the root cause of this infinite retry loop is fixed, we should consider reverting #23310. This loop should not retry indefinitely. If it does, exiting will just mask a more serious issue.

Well, the intent was to return the persistent error up into the log file or jobs table where it would be visible to the user. Granted, I botched that 🙈, but I'm still nervous about retry loops that never bail out. Is this concern misplaced? This edge case went unnoticed for a while; I'm worried there might be another.

bdarnell · 2018-03-12T22:43:20Z

I think the limit on the retry loop should stay.

Reviewed 3 of 3 files at r1, 2 of 2 files at r2.
Review status: all files reviewed at latest revision, 2 unresolved discussions, all commit checks successful.

pkg/storage/replica_command.go, line 716 at r2 (raw file):

	for retryable := retry.StartWithCtx(ctx, retryOpts); retryable.Next(); {
		// Admin commands always require the range lease to begin (see
		// executeAdminBatch), but we may have lost it while in this retry loop.

There are also entry points to AdminSplit that don't go through executeAdminBatch, IIRC.

pkg/storage/store.go, line 669 at r2 (raw file):

	EvalKnobs batcheval.TestingKnobs

	// TestingRequestFilter is called before evaluating each command.

Be more specific, since there's already EvalKnobs.TestingEvalFilter which runs at evaluation time. (the comment on ReplicaRequestFilter about when this runs belongs here, not on the function typedef)

Comments from Reviewable

nvanbenschoten · 2018-03-13T14:29:10Z

TFTRs! Added a new commit (now the second) to address #23310 (comment).

Review status: 2 of 5 files reviewed at latest revision, 2 unresolved discussions.

pkg/storage/replica_command.go, line 716 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

There are also entry points to AdminSplit that don't go through executeAdminBatch, IIRC.

It doesn't look like there are anymore. Everything else goes through the Replica by sending an AdminSplitRequest.

You might be thinking about the splitQueue, which calls adminSplitWithDescriptor directly. Because this bypasses AdminSplit, it is not subject to the retry loop. This looks like it was an intentional decision made in #10728 (see #10728 (comment)).

pkg/storage/store.go, line 669 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Be more specific, since there's already EvalKnobs.TestingEvalFilter which runs at evaluation time. (the comment on ReplicaRequestFilter about when this runs belongs here, not on the function typedef)

Expanded this comment here but left the other because all of these function typedefs seem to have similar descriptions.

Comments from Reviewable

This change adds another testing filter that can be used to influence the error returned from a request before it is evaluated. Notably, the filter is run before the request is added to the CommandQueue, so blocking in the filter will not block interfering requests. Release note: None

This loop used to return a `nil` error once the retry loop was exhausted, which made it look like the split had succeeded. It now returns an error. Release note: None

Fixes cockroachdb#23673. This change adds a call to `redirectOnOrAcquireLease` to the AdminSplit retry loop. This ensures that each attempt to perform a split is done by a leaseholder. Without this check, it was possible that a stale follower or a replica that had already been removed from the range could end up in an infinite retry loop. This was possible because these replicas could have stale knowledge about the state of the range descriptor. The effect of this was that the replicas' updates to the local range descriptor key could indefinitely hit `ConditionFailedErrors`, which are retried. The change also adds a regression test that creates this "replica removed while splitting" scenario. Without the leaseholder check introduced here, the test would fail. Now that the root cause of this infinite retry loop is fixed, we should consider reverting cockroachdb#23310. This loop should not retry indefinitely. If it does, exiting will just mask the issue. Release note: None

In tpccbench/nodes=9/cpu=4/multi-region, I consistently see the RESTORE fail a split because the retry loop in executeAdminCommandWithDescriptor interleaves with replicate queue activity until the retries are exhausted. This is likely since due to the geo-replicated nature of the cluster, the many steps involved in replicating the range take much longer than we're used to; in the logs I can see the split attempt and an interleaving replication change taking turns until the split gives up. The solution is to just retry the split forever as long as we're only seeing errors we know should eventually resolve. The retry limit was introduced to fix cockroachdb#23310 but there is lots of FUD around it. It's very likely that the problem encountered there was really fixed in cockroachdb#23762, which added a lease check between the split attempts. Touches cockroachdb#41028. Release justification: fixes roachtest failure, is low risk Release note: None

nvanbenschoten requested review from bdarnell, benesch, petermattis and a team March 12, 2018 21:59

nvanbenschoten force-pushed the nvanbenschoten/splitWithLease branch from 7dda735 to 3caaeca Compare March 13, 2018 14:28

nvanbenschoten force-pushed the nvanbenschoten/splitWithLease branch from 3caaeca to a48edc7 Compare March 13, 2018 14:30

nvanbenschoten added 3 commits March 13, 2018 11:52

storage: return an error from split retry loop after MaxRetries

e1955e8

This loop used to return a `nil` error once the retry loop was exhausted, which made it look like the split had succeeded. It now returns an error. Release note: None

nvanbenschoten force-pushed the nvanbenschoten/splitWithLease branch from a48edc7 to 85475fe Compare March 13, 2018 15:52

nvanbenschoten merged commit ae35649 into cockroachdb:master Mar 13, 2018

nvanbenschoten deleted the nvanbenschoten/splitWithLease branch March 13, 2018 17:25

nvanbenschoten mentioned this pull request Mar 13, 2018

cherrypick-2.0: storage: check leaseholder status in AdminSplit retry loop #23794

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

storage: check leaseholder status in AdminSplit retry loop #23762

storage: check leaseholder status in AdminSplit retry loop #23762

nvanbenschoten commented Mar 12, 2018 •

edited

Loading

cockroach-teamcity commented Mar 12, 2018

benesch commented Mar 12, 2018 •

edited

Loading

bdarnell commented Mar 12, 2018

nvanbenschoten commented Mar 13, 2018

storage: check leaseholder status in AdminSplit retry loop #23762

storage: check leaseholder status in AdminSplit retry loop #23762

Conversation

nvanbenschoten commented Mar 12, 2018 • edited Loading

cockroach-teamcity commented Mar 12, 2018

benesch commented Mar 12, 2018 • edited Loading

bdarnell commented Mar 12, 2018

nvanbenschoten commented Mar 13, 2018

nvanbenschoten commented Mar 12, 2018 •

edited

Loading

benesch commented Mar 12, 2018 •

edited

Loading