Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: retry AdminSplit on seeing roachpb.ConflictUpdatingRangeDesc #10728

Merged
merged 1 commit into from
Nov 21, 2016

Conversation

vivekmenezes
Copy link
Contributor

@vivekmenezes vivekmenezes commented Nov 16, 2016

This change is Reviewable

@vivekmenezes vivekmenezes changed the title storage: comments about ConditionFailedError in AdminSplit storage: retry AdminSplit on seeing roachpb.ConflictUpdatingRangeDesc Nov 17, 2016
@bdarnell
Copy link
Contributor

Reviewed 10 of 10 files at r1.
Review status: all files reviewed at latest revision, 3 unresolved discussions, all commit checks successful.


pkg/kv/dist_sender.go, line 936 at r1 (raw file):

          reply, pErr = ds.divideAndSendBatchToRanges(ctx, ba, intersected, isFirst)
          return response{reply: reply, pErr: pErr}
      case *roachpb.ConflictUpdatingRangeDesc:

This is not the right place to retry this error. The whole reason we pass the descriptor into AdminSplit is because if the descriptor changes out from under us, we may no longer want to do the split (someone else may have already split the range down to size).

Move the retry logic up to the higher level: in splitQueue, if there is a conflict, we should MaybeAdd(repl) to go back to the beginning and decide whether or not we want to retry, and if so where the new split point should be. In user-directed splits like the SQL SPLIT AT command, we probably do want to retry on conflicting updates until the split succeeds.


pkg/roachpb/errors.proto, line 215 at r1 (raw file):

// when it conflicts with some other process that updates the
// range descriptors.
message ConflictUpdatingRangeDesc {

Our other error types all end in Error.


pkg/storage/replica_command.go, line 2447 at r1 (raw file):

  }); err != nil {
      if err == errMsgConflictUpdatingRangeDesc {
          return reply, roachpb.NewError(&roachpb.ConflictUpdatingRangeDesc{})

roachpb.ConflictUpdatingRangeDesc implements Error; why can't we just use it directly instead of errMsgConflictUpdatingRangeDesc?


Comments from Reviewable

@vivekmenezes
Copy link
Contributor Author

Review status: 4 of 11 files reviewed at latest revision, 3 unresolved discussions, some commit checks pending.


pkg/kv/dist_sender.go, line 936 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

This is not the right place to retry this error. The whole reason we pass the descriptor into AdminSplit is because if the descriptor changes out from under us, we may no longer want to do the split (someone else may have already split the range down to size).

Move the retry logic up to the higher level: in splitQueue, if there is a conflict, we should MaybeAdd(repl) to go back to the beginning and decide whether or not we want to retry, and if so where the new split point should be. In user-directed splits like the SQL SPLIT AT command, we probably do want to retry on conflicting updates until the split succeeds.

I've updated the comments in SplitQueue explaining the rationale for using AdminSplit() and an AdminSplitRequest{} for the two cases where a split is requested. I believe the AdminSplit() use case in SplitQueue can benefit from retrying of the split on seeing a conflict. If it does see a conflict with an AdminSplit on the same key, it will see a different error `range is already split at key` which gets returned.

pkg/roachpb/errors.proto, line 215 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Our other error types all end in Error.

Done.

pkg/storage/replica_command.go, line 2447 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

roachpb.ConflictUpdatingRangeDesc implements Error; why can't we just use it directly instead of errMsgConflictUpdatingRangeDesc?

Done.

Comments from Reviewable

@tamird
Copy link
Contributor

tamird commented Nov 17, 2016

Reviewed 3 of 10 files at r1, 7 of 7 files at r2.
Review status: all files reviewed at latest revision, 9 unresolved discussions, all commit checks successful.


pkg/kv/dist_sender.go, line 936 at r1 (raw file):

Previously, vivekmenezes wrote…

I've updated the comments in SplitQueue explaining the rationale for using AdminSplit() and an AdminSplitRequest{} for the two cases where a split is requested. I believe the AdminSplit() use case in SplitQueue can benefit from retrying of the split on seeing a conflict. If it does see a conflict with an AdminSplit on the same key, it will see a different error range is already split at key which gets returned.

+1 to @bdarnell's comment that retrying at the dist sender is the wrong level. I didn't quite understand your reply to that.

Retrying at the client will also allow you to avoid the extra structured error you added here.


pkg/storage/client_split_test.go, line 304 at r2 (raw file):

          // split key is outside of the bounds for the range.
          expected := strings.Join([]string{
              "conflict updating range descriptor",

why a literal string? you should probably match the structured error?


pkg/storage/replica_command.go, line 2389 at r2 (raw file):

              // left range descriptor is read outside of the transaction.
              if _, ok := err.(*roachpb.ConditionFailedError); ok {
                  return &roachpb.ConflictUpdatingRangeDescError{}

@bdarnell suggested trying above this level, but I think it might be OK to do it here, too.


pkg/storage/replica_command.go, line 2442 at r2 (raw file):

      return nil
  }); err != nil {
      if pErr, ok := err.(*roachpb.ConflictUpdatingRangeDescError); ok {

pErr is what we name things of type roachpb.Error, which this is not.


pkg/storage/split_queue.go, line 106 at r2 (raw file):

      for _, splitKey := range splitKeys {
          // Use AdminSplit() here because the zone config stipulates
          // splitting at a specific key independent of which replica it

s/replica/range/ unless I'm misunderstanding.


pkg/storage/split_queue.go, line 108 at r2 (raw file):

          // splitting at a specific key independent of which replica it
          // lies on (the specific replica desc passed in can change before
          // the command is applied). If a range was already split at this

s/a/any/


pkg/storage/split_queue.go, line 126 at r2 (raw file):

      log.Infof(ctx, "splitting size=%d max=%d", size, zone.RangeMaxBytes)
      // Do not use the AdminSplit() command because we are attempting to
      // split a specific replica at a key. If this replica has already been

i don't understand how these comments address the FIXME that previously occupied this space. ISTM that they do the exact same thing (since the replica is the lease holer, the AdminSplit above, via dist sender + local call optimization will result in exactly the same call to the replica).


Comments from Reviewable

@bdarnell
Copy link
Contributor

Review status: 9 of 11 files reviewed at latest revision, 10 unresolved discussions, all commit checks successful.


pkg/storage/split_queue.go, line 126 at r2 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

i don't understand how these comments address the FIXME that previously occupied this space. ISTM that they do the exact same thing (since the replica is the lease holer, the AdminSplit above, via dist sender + local call optimization will result in exactly the same call to the replica).

But this calls directly to the replica while the preceding one goes through the DistSender. This gives them different semantics WRT retries on conflict, although that is subtle and indirect and doesn't make me feel any better about doing that retry in DistSender instead of either at the various callers of AdminSplit or internal to AdminSplit itself.

This should really be calling Replica.AdminSplit() directly (instead of going through Replica.Send. Or we should add a RangeDescriptor to AdminSplitRequest to unify the two interfaces) so it can pass in the descriptor we had when we made the decision to split. When there is a conflict, control must be returned here so we can see if the split is still necessary.


pkg/storage/split_queue.go, line 125 at r3 (raw file):

  if float64(size)/float64(zone.RangeMaxBytes) > 1 {
      log.Infof(ctx, "splitting size=%d max=%d", size, zone.RangeMaxBytes)
      // We cannot use the AdminSplit() command here because it can only be

What distinction are you trying to make here? You're sending an AdminSplitRequest which will be processed by Replica.AdminSplit(). What is "the AdminSplit() command"? (we normally use "command" to refer to raft commands, which AdminSplit is not)


Comments from Reviewable

@vivekmenezes
Copy link
Contributor Author

Review status: 8 of 11 files reviewed at latest revision, 10 unresolved discussions.


pkg/kv/dist_sender.go, line 936 at r1 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

+1 to @bdarnell's comment that retrying at the dist sender is the wrong level. I didn't quite understand your reply to that.

Retrying at the client will also allow you to avoid the extra structured error you added here.

I believe there is a concern that under certain circumstances a retry might be unwarranted, in particular in SplitQueue, but there is a reason why AdminSplit() is used for splits due to zone config, and AdminSplitRequest is used for splits due to range size.

pkg/storage/replica_command.go, line 2442 at r2 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

pErr is what we name things of type roachpb.Error, which this is not.

fixed

pkg/storage/split_queue.go, line 106 at r2 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

s/replica/range/ unless I'm misunderstanding.

Done.

pkg/storage/split_queue.go, line 108 at r2 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

s/a/any/

Done.

pkg/storage/split_queue.go, line 126 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

But this calls directly to the replica while the preceding one goes through the DistSender. This gives them different semantics WRT retries on conflict, although that is subtle and indirect and doesn't make me feel any better about doing that retry in DistSender instead of either at the various callers of AdminSplit or internal to AdminSplit itself.

This should really be calling Replica.AdminSplit() directly (instead of going through Replica.Send. Or we should add a RangeDescriptor to AdminSplitRequest to unify the two interfaces) so it can pass in the descriptor we had when we made the decision to split. When there is a conflict, control must be returned here so we can see if the split is still necessary.

I've reworded the comments and also call Replica.AdminSplit() directly here. I do not catch the `roachpb.ConflictUpdatingRangeDescError` here because it's not really worth retrying that here. If a conflict really occurred we can either assume that the right thing happened through another split, or we'll retry again at another time when the replica gets queued again.

pkg/storage/split_queue.go, line 125 at r3 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

What distinction are you trying to make here? You're sending an AdminSplitRequest which will be processed by Replica.AdminSplit(). What is "the AdminSplit() command"? (we normally use "command" to refer to raft commands, which AdminSplit is not)

I've replaced AdminSplit() with db.AdminSplit(), saying that we can't use db.AdminSplit() here.

Comments from Reviewable

@tamird
Copy link
Contributor

tamird commented Nov 18, 2016

Reviewed 1 of 10 files at r1, 1 of 2 files at r3, 2 of 2 files at r4.
Review status: all files reviewed at latest revision, 7 unresolved discussions, some commit checks pending.


pkg/kv/dist_sender.go, line 936 at r1 (raw file):

Previously, vivekmenezes wrote…

I believe there is a concern that under certain circumstances a retry might be unwarranted, in particular in SplitQueue, but there is a reason why AdminSplit() is used for splits due to zone config, and AdminSplitRequest is used for splits due to range size.

What reason is that? I've opened a separate PR to disentangle that discussion from this PR (which is about retrying the split): https://github.com//pull/10787.

pkg/storage/split_queue.go, line 125 at r3 (raw file):

Previously, vivekmenezes wrote…

I've replaced AdminSplit() with db.AdminSplit(), saying that we can't use db.AdminSplit() here.

I don't see any reason this can't call `r.AdminSplit()` in both cases. Since this is not really what this PR is about, I've extracted this change into a separate PR: https://github.com//pull/10787.

Comments from Reviewable

@andreimatei
Copy link
Contributor

Review status: all files reviewed at latest revision, 9 unresolved discussions, some commit checks failed.


pkg/storage/split_queue.go, line 105 at r4 (raw file):

      log.Infof(ctx, "splitting at keys %v", splitKeys)
      for _, splitKey := range splitKeys {
          // Use db.AdminSplit() here because the zone config stipulates

sorry, one more cook... I think this should also call r.AdminSplit(), like the case below. I don't think the theoretical extra chances of success justify this confusing code. The queues are async processors after all, so as long as the right thing eventually happens, that's good enough.
The comment also doesn't help, because it doesn't contrast and explain the difference between db.AdminSplit and r.AdminSplit (what Ben explained below - the role of the DistSender). In fact, for me, the comment as it is worded now suggests that db.AdminSplit is more likely to return an error.


pkg/storage/split_queue.go, line 132 at r4 (raw file):

      // pass it into AdminSplit().
      //
      // Note: another split can occur between the time the range size is

explain what happens then in this comment


Comments from Reviewable

@vivekmenezes
Copy link
Contributor Author

Ready for another review.


Review status: 9 of 10 files reviewed at latest revision, 9 unresolved discussions.


pkg/kv/dist_sender.go, line 936 at r1 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

What reason is that? I've opened a separate PR to disentangle that discussion from this PR (which is about retrying the split): #10787.

I've rebased on top of that PR

pkg/storage/client_split_test.go, line 304 at r2 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

why a literal string? you should probably match the structured error?

Done.

pkg/storage/split_queue.go, line 125 at r3 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

I don't see any reason this can't call r.AdminSplit() in both cases. Since this is not really what this PR is about, I've extracted this change into a separate PR: #10787.

thanks

pkg/storage/split_queue.go, line 105 at r4 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

sorry, one more cook... I think this should also call r.AdminSplit(), like the case below. I don't think the theoretical extra chances of success justify this confusing code. The queues are async processors after all, so as long as the right thing eventually happens, that's good enough.
The comment also doesn't help, because it doesn't contrast and explain the difference between db.AdminSplit and r.AdminSplit (what Ben explained below - the role of the DistSender). In fact, for me, the comment as it is worded now suggests that db.AdminSplit is more likely to return an error.

that was just merged in : #10787


pkg/storage/split_queue.go, line 132 at r4 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

explain what happens then in this comment

NA

Comments from Reviewable

@bdarnell
Copy link
Contributor

I still object to doing this at the DistSender level. How about retrying in Replica.AdminSplit if a key was specified in the request? (if no key was specified, we can return the error to the caller). In this case we can even get rid of the new proto error since the retry can be handled locally.


Review status: 9 of 10 files reviewed at latest revision, 6 unresolved discussions, all commit checks successful.


Comments from Reviewable

@vivekmenezes
Copy link
Contributor Author

Ah ha it occurred to me that when we retry the request at the replica, if
the replica no longer holds the key, it will get retried at the dist sender
level. So yes doing this at the replica level will work.

On Sat, Nov 19, 2016, 3:23 AM Ben Darnell [email protected] wrote:

I still object to doing this at the DistSender level. How about retrying
in Replica.AdminSplit if a key was specified in the request? (if no key
was specified, we can return the error to the caller). In this case we can

even get rid of the new proto error since the retry can be handled locally.

Review status: 9 of 10 files reviewed at latest revision, 6 unresolved

discussions, all commit checks successful.

Comments from Reviewable
https://reviewable.io:443/reviews/cockroachdb/cockroach/10728#-:-KWvf6Fzp5WpjxS3C-4m:blzptzk


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#10728 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ALOpBPwpkRj3FWtjet0Ys6ogQ8xex9l7ks5q_rHogaJpZM4Kz9JT
.

@vivekmenezes
Copy link
Contributor Author

I've implemented the retry at the AdminSplit command level. Ready for another review. Thanks


Review status: 3 of 5 files reviewed at latest revision, 6 unresolved discussions, some commit checks pending.


Comments from Reviewable

@tamird
Copy link
Contributor

tamird commented Nov 21, 2016

Reviewed 2 of 2 files at r5, 7 of 7 files at r6.
Review status: all files reviewed at latest revision, 8 unresolved discussions, all commit checks successful.


pkg/storage/client_split_test.go, line 304 at r6 (raw file):

			// split key is outside of the bounds for the range.
			expected := strings.Join([]string{
				(&roachpb.ConditionFailedError{}).String(),

is it actually possible for this error to leak out of the store? if so, why did the error handling in sql/split.go and storage.replica_queue_test.go go away?


pkg/storage/replica_command.go, line 2437 at r6 (raw file):

			// key was provided to the AdminSplit command.
			if _, ok := err.(*roachpb.ConditionFailedError); ok && keyProvided {
				desc = r.Desc()

this seems pretty sketchy - why even take a descriptor argument if we're going to just grab it off the replica?


Comments from Reviewable

@bdarnell
Copy link
Contributor

:lgtm:


Reviewed 1 of 2 files at r4, 7 of 7 files at r6.
Review status: all files reviewed at latest revision, 9 unresolved discussions, all commit checks successful.


pkg/storage/client_split_test.go, line 304 at r6 (raw file):

Previously, tamird (Tamir Duberstein) wrote… > is it actually possible for this error to leak out of the store? if so, why did the error handling in sql/split.go and storage.replica_queue_test.go go away?
It should only be able to leak out if no split key is given, which is not the case in this test.

pkg/storage/replica_command.go, line 2302 at r6 (raw file):

	keyProvided := len(args.SplitKey) != 0

	for retryable := retry.StartWithCtx(ctx, retry.Options{}); retryable.Next(); {

Use base.DefaultRetryOptions(), not retry.Options{}.


pkg/storage/replica_command.go, line 2437 at r6 (raw file):

Previously, tamird (Tamir Duberstein) wrote… > this seems pretty sketchy - why even take a descriptor argument if we're going to just grab it off the replica?
The two use cases for this method are diverging: There's the split-by-size case, which passes a descriptor and wants a single attempt with that value as the CPut expected value, and the split-at-key case which passes a key and wants multiple attempts (each using the current r.Desc() as its expected value).

It might clean things up to leave AdminSplit as it was and introduce a new AdminSplitAtKey method, which would not take a descriptor but would contain the retry loop and calls to r.Desc() on each iteration. Or it might not. This is fine with me too but probably needs a comment about why we're resetting the descriptor.


Comments from Reviewable

@tamird
Copy link
Contributor

tamird commented Nov 21, 2016

Review status: all files reviewed at latest revision, 9 unresolved discussions, all commit checks successful.


pkg/storage/client_split_test.go, line 304 at r6 (raw file):

Previously, bdarnell (Ben Darnell) wrote… > It should only be able to leak out if no split key is given, which is not the case in this test.
Right, so this should be removed.

pkg/storage/replica_command.go, line 2437 at r6 (raw file):

Previously, bdarnell (Ben Darnell) wrote… > The two use cases for this method are diverging: There's the split-by-size case, which passes a descriptor and wants a single attempt with that value as the CPut expected value, and the split-at-key case which passes a key and wants multiple attempts (each using the current r.Desc() as its expected value). > > It might clean things up to leave AdminSplit as it was and introduce a new AdminSplitAtKey method, which would not take a descriptor but would contain the retry loop and calls to `r.Desc()` on each iteration. Or it might not. This is fine with me too but probably needs a comment about why we're resetting the descriptor.
Splitting the two methods SGTM.

Comments from Reviewable

@vivekmenezes
Copy link
Contributor Author

Review status: 3 of 7 files reviewed at latest revision, 9 unresolved discussions.


pkg/storage/client_split_test.go, line 304 at r6 (raw file):

Previously, tamird (Tamir Duberstein) wrote… > Right, so this should be removed.
Done.

pkg/storage/replica_command.go, line 2302 at r6 (raw file):

Previously, bdarnell (Ben Darnell) wrote… > Use `base.DefaultRetryOptions()`, not `retry.Options{}`.
Done.

pkg/storage/replica_command.go, line 2437 at r6 (raw file):

Previously, tamird (Tamir Duberstein) wrote… > Splitting the two methods SGTM.
Done.

Comments from Reviewable

@tamird
Copy link
Contributor

tamird commented Nov 21, 2016

:lgtm:


Reviewed 9 of 9 files at r7.
Review status: all files reviewed at latest revision, 11 unresolved discussions, some commit checks pending.


pkg/storage/replica_command.go, line 2281 at r7 (raw file):

	ctx context.Context, args roachpb.AdminSplitRequest,
) (roachpb.AdminSplitResponse, *roachpb.Error) {
	var reply roachpb.AdminSplitResponse

nit: remove this and use literal roachpb.AdminSplitResponse{} at the error returns.


pkg/storage/replica_command.go, line 2294 at r7 (raw file):

		}
	}
	return reply, nil

Uh, this doesn't seem right. You probably want to return reply, roachpb.NewError(ctx.Err()) here.


pkg/storage/replica_command.go, line 2453 at r7 (raw file):

		// ConditionFailedError in the error detail so that the command can be
		// retried.
		if failure, ok := err.(*roachpb.ConditionFailedError); ok {

nit: there is no need for the failure binding at all:

if _, ok := err.(*roachpb.ConditionFailedError); ok {
  return reply, roachpb.NewError(err)
}

Comments from Reviewable

This allows an internal error to be retried instead of it needing to
be retried on the outside.
@vivekmenezes
Copy link
Contributor Author

Review status: 7 of 8 files reviewed at latest revision, 11 unresolved discussions.


pkg/storage/replica_command.go, line 2281 at r7 (raw file):

Previously, tamird (Tamir Duberstein) wrote… > nit: remove this and use literal `roachpb.AdminSplitResponse{}` at the error returns.
Done.

pkg/storage/replica_command.go, line 2294 at r7 (raw file):

Previously, tamird (Tamir Duberstein) wrote… > Uh, this doesn't seem right. You probably want to return `reply, roachpb.NewError(ctx.Err())` here.
Done.

pkg/storage/replica_command.go, line 2453 at r7 (raw file):

Previously, tamird (Tamir Duberstein) wrote… > nit: there is no need for the `failure` binding at all: > ``` > if _, ok := err.(*roachpb.ConditionFailedError); ok { > return reply, roachpb.NewError(err) > } > ```
Done.

Comments from Reviewable

@tamird
Copy link
Contributor

tamird commented Nov 21, 2016

Reviewed 1 of 1 files at r8.
Review status: all files reviewed at latest revision, 8 unresolved discussions.


Comments from Reviewable

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants