-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Push Txns & Resolve Intents on Conflicts #72
Conversation
Automatically push transactions on write intent failures and resolve conflicting write intents. This is done at the level of Store.ExecuteCmd on command failure. Write/write conflicts are pushed and resolved but then must still return the write intent error to the client so the client can retry with a new client command ID, ensuring idempotence. This way, all original requests (the one which hit the conflict) will always return a write intent error via the response cache. Follow-on requests with the new client command ID can succeed, and with new ClientCmdID, will always succeed on retries. Read/write conflicts are automatically retried instead of returning the write intent error, because they don't have the same idempotency issues with client command ID. Reorg'd a bit of code between store and range to move range-specific checks into range (addition of Range.AddCmd method). Also, added an ability to submit commands to Raft without having to always wait for completion. This is used to "fire and forget" resolve intent cmds. Making sure we cleanup entries pending in the "read queue" is slightly tricky in Range.addReadWriteCmd. Moved writeIntentError and writeTooOldError from mvcc.go to the proto/ subdirectory as they're communicated across RPCs from server to clients. Add the conflicting Key and also a Resolved flag to WriteIntentError. This allows the client to decide whether to backoff (if resolve failed) or retry immediately.
@bdarnell @tschottdorf @cockroachdb/developers |
// If the original client didn't wait, log execution errors so they're | ||
// surfaced somewhere. | ||
if !wait && err != nil { | ||
log.V(1).Infof("non-synchronous execution of %s with %+v failed: %v", cmd.Method, cmd.Args, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A log.V(1)
is a pretty weak form of "surfaced somewhere"; I'd make this at least a log.Info. I might also log at V(1) even if wait
is true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed this to a warning and added a V(1) info in Range.executeCmd to always log execution of commands.
LGTM |
} | ||
return rng.ReadWriteCmd(method, args, reply) | ||
|
||
return wiErr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we may add a log here to see how frequently it may happen.
We may potentially resubmit the command here immediately without a new ClientCmdID to optimize the latency. (need to work around the respCache)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a V(1) log for the method as a whole. I figured we can get a bit fancier with optimizations in future changes. Right now, wanted to make sure we keep our idempotency guarantees. We'll need to start optimizing all of this at some point, but let's do it right, using appropriate benchmarks, and avoid premature optimization.
LGTM |
…intents Push Txns & Resolve Intents on Conflicts
We have been seeing long startup times which disappear spontaneously. During a restart of the beta cluster, the following goroutine was observed, which suggests that we were spending a lot of time GCing replicas on startup. engine ??:0 _Cfunc_DBIterNext(cockroachdb#324, cockroachdb#323, 0, 0, 0, 0, 0, 0, 0) engine rocksdb.go:1135 (*rocksDBIterator).Next(cockroachdb#235) storage replica_data_iter.go:104 (*ReplicaDataIterator).Next(cockroachdb#316) storage store.go:1748 (*Store).destroyReplicaData(#109, cockroachdb#317, 0, 0) storage store.go:841 (*Store).Start.func2(0x101b, cockroachdb#300, 0x36, 0x40, cockroachdb#301, 0x36, 0x40, cockroachdb#315, 0x3, 0x4, ...) storage store.go:734 IterateRangeDescriptors.func1(cockroachdb#306, 0x40, 0x41, cockroachdb#307, 0x92, 0x92, cockroachdb#341, 0, 0x186c, 0x4000, ...) engine mvcc.go:1593 MVCCIterate(cockroachdb#329, #68, #47, #81, cockroachdb#232, 0x9, 0x10, cockroachdb#233, 0xb, 0x10, ...) storage store.go:738 IterateRangeDescriptors(cockroachdb#330, cockroachdb#196, #47, #81, cockroachdb#195, #179, #110) storage store.go:867 (*Store).Start(#109, cockroachdb#330, cockroachdb#196, #179, cockroachdb#185, 0x1) server node.go:405 (*Node).initStores(#78, cockroachdb#330, cockroachdb#196, #98, 0x1, 0x1, #179, 0, #55) server node.go:330 (*Node).start(#78, cockroachdb#330, cockroachdb#196, #42, #129, #98, 0x1, 0x1, 0, 0, ...) server server.go:431 (*Server).Start(#5, cockroachdb#330, cockroachdb#196, #95, 0x1) cli start.go:368 runStart(#34, #178, 0, 0x9, 0, 0) cobra command.go:599 (*Command).execute(#34, #177, 0x9, 0x9, #34, #177) cobra command.go:689 (*Command).ExecuteC(#33, #70, cockroachdb#343, #72) cobra command.go:648 (*Command).Execute(#33, #71, cockroachdb#343) cli cli.go:96 Run(#64, 0xa, 0xa, 0, 0) main main.go:37 main()
We have been seeing long startup times which disappear spontaneously. During a restart of the beta cluster, the following goroutine was observed, which suggests that we were spending a lot of time GCing replicas on startup. engine ??:0 _Cfunc_DBIterNext(cockroachdb#324, cockroachdb#323, 0, 0, 0, 0, 0, 0, 0) engine rocksdb.go:1135 (*rocksDBIterator).Next(cockroachdb#235) storage replica_data_iter.go:104 (*ReplicaDataIterator).Next(cockroachdb#316) storage store.go:1748 (*Store).destroyReplicaData(#109, cockroachdb#317, 0, 0) storage store.go:841 (*Store).Start.func2(0x101b, cockroachdb#300, 0x36, 0x40, cockroachdb#301, 0x36, 0x40, cockroachdb#315, 0x3, 0x4, ...) storage store.go:734 IterateRangeDescriptors.func1(cockroachdb#306, 0x40, 0x41, cockroachdb#307, 0x92, 0x92, cockroachdb#341, 0, 0x186c, 0x4000, ...) engine mvcc.go:1593 MVCCIterate(cockroachdb#329, #68, #47, #81, cockroachdb#232, 0x9, 0x10, cockroachdb#233, 0xb, 0x10, ...) storage store.go:738 IterateRangeDescriptors(cockroachdb#330, cockroachdb#196, #47, #81, cockroachdb#195, #179, #110) storage store.go:867 (*Store).Start(#109, cockroachdb#330, cockroachdb#196, #179, cockroachdb#185, 0x1) server node.go:405 (*Node).initStores(#78, cockroachdb#330, cockroachdb#196, #98, 0x1, 0x1, #179, 0, #55) server node.go:330 (*Node).start(#78, cockroachdb#330, cockroachdb#196, #42, #129, #98, 0x1, 0x1, 0, 0, ...) server server.go:431 (*Server).Start(#5, cockroachdb#330, cockroachdb#196, #95, 0x1) cli start.go:368 runStart(#34, #178, 0, 0x9, 0, 0) cobra command.go:599 (*Command).execute(#34, #177, 0x9, 0x9, #34, #177) cobra command.go:689 (*Command).ExecuteC(#33, #70, cockroachdb#343, #72) cobra command.go:648 (*Command).Execute(#33, #71, cockroachdb#343) cli cli.go:96 Run(#64, 0xa, 0xa, 0, 0) main main.go:37 main()
…onversion Changes to get the correct start time and end time for random wait ca…
Automatically push transactions on write intent failures and resolve
conflicting write intents. This is done at the level of
Store.ExecuteCmd on command failure. Write/write conflicts are
pushed and resolved but then must still return the write intent
error to the client so the client can retry with a new client command
ID, ensuring idempotence. This way, all original requests (the one
which hit the conflict) will always return a write intent error via
the response cache. Follow-on requests with the new client command
ID can succeed, and with new ClientCmdID, will always succeed on
retries. Read/write conflicts are automatically retried instead of
returning the write intent error, because they don't have the same
idempotency issues with client command ID.
Reorg'd a bit of code between store and range to move range-specific
checks into range (addition of Range.AddCmd method). Also, added an
ability to submit commands to Raft without having to always wait for
completion. This is used to "fire and forget" resolve intent cmds.
Making sure we cleanup entries pending in the "read queue" is slightly
tricky in Range.addReadWriteCmd.
Moved writeIntentError and writeTooOldError from mvcc.go to the proto/
subdirectory as they're communicated across RPCs from server to clients.
Add the conflicting Key and also a Resolved flag to WriteIntentError.
This allows the client to decide whether to backoff (if resolve failed)
or retry immediately.