-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: retry AdminSplit on seeing roachpb.ConflictUpdatingRangeDesc #10728
Conversation
Reviewed 10 of 10 files at r1. pkg/kv/dist_sender.go, line 936 at r1 (raw file):
This is not the right place to retry this error. The whole reason we pass the descriptor into AdminSplit is because if the descriptor changes out from under us, we may no longer want to do the split (someone else may have already split the range down to size). Move the retry logic up to the higher level: in splitQueue, if there is a conflict, we should MaybeAdd(repl) to go back to the beginning and decide whether or not we want to retry, and if so where the new split point should be. In user-directed splits like the SQL pkg/roachpb/errors.proto, line 215 at r1 (raw file):
Our other error types all end in pkg/storage/replica_command.go, line 2447 at r1 (raw file):
roachpb.ConflictUpdatingRangeDesc implements Error; why can't we just use it directly instead of Comments from Reviewable |
Review status: 4 of 11 files reviewed at latest revision, 3 unresolved discussions, some commit checks pending. pkg/kv/dist_sender.go, line 936 at r1 (raw file):
|
Reviewed 3 of 10 files at r1, 7 of 7 files at r2. pkg/kv/dist_sender.go, line 936 at r1 (raw file):
|
Review status: 9 of 11 files reviewed at latest revision, 10 unresolved discussions, all commit checks successful. pkg/storage/split_queue.go, line 126 at r2 (raw file):
|
Review status: 8 of 11 files reviewed at latest revision, 10 unresolved discussions. pkg/kv/dist_sender.go, line 936 at r1 (raw file):
|
Reviewed 1 of 10 files at r1, 1 of 2 files at r3, 2 of 2 files at r4. pkg/kv/dist_sender.go, line 936 at r1 (raw file):
|
Review status: all files reviewed at latest revision, 9 unresolved discussions, some commit checks failed. pkg/storage/split_queue.go, line 105 at r4 (raw file):
sorry, one more cook... I think this should also call pkg/storage/split_queue.go, line 132 at r4 (raw file):
explain what happens then in this comment Comments from Reviewable |
Ready for another review. Review status: 9 of 10 files reviewed at latest revision, 9 unresolved discussions. pkg/kv/dist_sender.go, line 936 at r1 (raw file):
|
I still object to doing this at the DistSender level. How about retrying in Review status: 9 of 10 files reviewed at latest revision, 6 unresolved discussions, all commit checks successful. Comments from Reviewable |
Ah ha it occurred to me that when we retry the request at the replica, if On Sat, Nov 19, 2016, 3:23 AM Ben Darnell [email protected] wrote:
|
I've implemented the retry at the AdminSplit command level. Ready for another review. Thanks Review status: 3 of 5 files reviewed at latest revision, 6 unresolved discussions, some commit checks pending. Comments from Reviewable |
Reviewed 2 of 2 files at r5, 7 of 7 files at r6. pkg/storage/client_split_test.go, line 304 at r6 (raw file):
is it actually possible for this error to leak out of the store? if so, why did the error handling in sql/split.go and storage.replica_queue_test.go go away? pkg/storage/replica_command.go, line 2437 at r6 (raw file):
this seems pretty sketchy - why even take a descriptor argument if we're going to just grab it off the replica? Comments from Reviewable |
Reviewed 1 of 2 files at r4, 7 of 7 files at r6. pkg/storage/client_split_test.go, line 304 at r6 (raw file): Previously, tamird (Tamir Duberstein) wrote…> is it actually possible for this error to leak out of the store? if so, why did the error handling in sql/split.go and storage.replica_queue_test.go go away?pkg/storage/replica_command.go, line 2302 at r6 (raw file):
Use pkg/storage/replica_command.go, line 2437 at r6 (raw file): Previously, tamird (Tamir Duberstein) wrote…> this seems pretty sketchy - why even take a descriptor argument if we're going to just grab it off the replica?It might clean things up to leave AdminSplit as it was and introduce a new AdminSplitAtKey method, which would not take a descriptor but would contain the retry loop and calls to Comments from Reviewable |
Review status: all files reviewed at latest revision, 9 unresolved discussions, all commit checks successful. pkg/storage/client_split_test.go, line 304 at r6 (raw file): Previously, bdarnell (Ben Darnell) wrote…> It should only be able to leak out if no split key is given, which is not the case in this test.pkg/storage/replica_command.go, line 2437 at r6 (raw file): Previously, bdarnell (Ben Darnell) wrote…> The two use cases for this method are diverging: There's the split-by-size case, which passes a descriptor and wants a single attempt with that value as the CPut expected value, and the split-at-key case which passes a key and wants multiple attempts (each using the current r.Desc() as its expected value). > > It might clean things up to leave AdminSplit as it was and introduce a new AdminSplitAtKey method, which would not take a descriptor but would contain the retry loop and calls to `r.Desc()` on each iteration. Or it might not. This is fine with me too but probably needs a comment about why we're resetting the descriptor.Comments from Reviewable |
Review status: 3 of 7 files reviewed at latest revision, 9 unresolved discussions. pkg/storage/client_split_test.go, line 304 at r6 (raw file): Previously, tamird (Tamir Duberstein) wrote…> Right, so this should be removed.pkg/storage/replica_command.go, line 2302 at r6 (raw file): Previously, bdarnell (Ben Darnell) wrote…> Use `base.DefaultRetryOptions()`, not `retry.Options{}`.pkg/storage/replica_command.go, line 2437 at r6 (raw file): Previously, tamird (Tamir Duberstein) wrote…> Splitting the two methods SGTM.Comments from Reviewable |
Reviewed 9 of 9 files at r7. pkg/storage/replica_command.go, line 2281 at r7 (raw file):
nit: remove this and use literal pkg/storage/replica_command.go, line 2294 at r7 (raw file):
Uh, this doesn't seem right. You probably want to return pkg/storage/replica_command.go, line 2453 at r7 (raw file):
nit: there is no need for the
Comments from Reviewable |
This allows an internal error to be retried instead of it needing to be retried on the outside.
Review status: 7 of 8 files reviewed at latest revision, 11 unresolved discussions. pkg/storage/replica_command.go, line 2281 at r7 (raw file): Previously, tamird (Tamir Duberstein) wrote…> nit: remove this and use literal `roachpb.AdminSplitResponse{}` at the error returns.pkg/storage/replica_command.go, line 2294 at r7 (raw file): Previously, tamird (Tamir Duberstein) wrote…> Uh, this doesn't seem right. You probably want to return `reply, roachpb.NewError(ctx.Err())` here.pkg/storage/replica_command.go, line 2453 at r7 (raw file): Previously, tamird (Tamir Duberstein) wrote…> nit: there is no need for the `failure` binding at all: > ``` > if _, ok := err.(*roachpb.ConditionFailedError); ok { > return reply, roachpb.NewError(err) > } > ```Comments from Reviewable |
Reviewed 1 of 1 files at r8. Comments from Reviewable |
This change is![Reviewable](https://camo.githubusercontent.com/1541c4039185914e83657d3683ec25920c672c6c5c7ab4240ee7bff601adec0b/68747470733a2f2f72657669657761626c652e696f2f7265766965775f627574746f6e2e737667)