-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow Inconsistent Read Operations #343
Comments
Instead of "stale", how about an end value called "readConsistency"? Calling it stale sort of implies it'll be stale when often it won't. Further there are at least three values, though we'll probably only support two for time being. There are: CONSISTENT (For now) We may want to add CONSENSUS In order to insist that reads go through consensus instead of relying on leases. |
Makes sense except that I don't know why you, as a client, would want to explicitly force something through consensus. Even if that were faster in some tricky cases, you would have to be so involved in the internals to know when to use it. What's your reason for wanting to expose it? |
Like I said, I don't think it's high priority. Except various do implementations have added it as a failsafe when testing against jepsen and the like. You could easily slow down a failed leader's clock if you were being adversarial and convince it it still had a leader lease when in fact it didn't. Forcing through consensus is the gold standard. |
You're right, it should be in there. |
MVCCGet and MVCCScan now take a "consistent" boolean which, if false, ignores intents. Moved the guts of MVCCScan into MVCCIterate and MVCCScan just calls into it. This change allows the removal of the MVCCIterateCommitted method. Provides a unittest for "deadloop" issue of encountering an extant intent at a meta indexing record. Fixes #343 Fixes #435
MVCCGet and MVCCScan now take a "consistent" boolean which, if false, ignores intents. Moved the guts of MVCCScan into MVCCIterate and MVCCScan just calls into it. This change allows the removal of the MVCCIterateCommitted method. Provides a unittest for "deadloop" issue of encountering an extant intent at a meta indexing record. Fixes #343 Fixes #435
MVCCGet and MVCCScan now take a "consistent" boolean which, if false, ignores intents. Moved the guts of MVCCScan into MVCCIterate and MVCCScan just calls into it. This change allows the removal of the MVCCIterateCommitted method. Provides a unittest for "deadloop" issue of encountering an extant intent at a meta indexing record. Fixes #343 Fixes #435
We have been seeing long startup times which disappear spontaneously. During a restart of the beta cluster, the following goroutine was observed, which suggests that we were spending a lot of time GCing replicas on startup. engine ??:0 _Cfunc_DBIterNext(cockroachdb#324, cockroachdb#323, 0, 0, 0, 0, 0, 0, 0) engine rocksdb.go:1135 (*rocksDBIterator).Next(cockroachdb#235) storage replica_data_iter.go:104 (*ReplicaDataIterator).Next(cockroachdb#316) storage store.go:1748 (*Store).destroyReplicaData(#109, cockroachdb#317, 0, 0) storage store.go:841 (*Store).Start.func2(0x101b, cockroachdb#300, 0x36, 0x40, cockroachdb#301, 0x36, 0x40, cockroachdb#315, 0x3, 0x4, ...) storage store.go:734 IterateRangeDescriptors.func1(cockroachdb#306, 0x40, 0x41, cockroachdb#307, 0x92, 0x92, cockroachdb#341, 0, 0x186c, 0x4000, ...) engine mvcc.go:1593 MVCCIterate(cockroachdb#329, #68, #47, #81, cockroachdb#232, 0x9, 0x10, cockroachdb#233, 0xb, 0x10, ...) storage store.go:738 IterateRangeDescriptors(cockroachdb#330, cockroachdb#196, #47, #81, cockroachdb#195, #179, #110) storage store.go:867 (*Store).Start(#109, cockroachdb#330, cockroachdb#196, #179, cockroachdb#185, 0x1) server node.go:405 (*Node).initStores(#78, cockroachdb#330, cockroachdb#196, #98, 0x1, 0x1, #179, 0, #55) server node.go:330 (*Node).start(#78, cockroachdb#330, cockroachdb#196, #42, #129, #98, 0x1, 0x1, 0, 0, ...) server server.go:431 (*Server).Start(#5, cockroachdb#330, cockroachdb#196, #95, 0x1) cli start.go:368 runStart(#34, #178, 0, 0x9, 0, 0) cobra command.go:599 (*Command).execute(#34, #177, 0x9, 0x9, #34, #177) cobra command.go:689 (*Command).ExecuteC(#33, #70, cockroachdb#343, #72) cobra command.go:648 (*Command).Execute(#33, #71, cockroachdb#343) cli cli.go:96 Run(#64, 0xa, 0xa, 0, 0) main main.go:37 main()
We have been seeing long startup times which disappear spontaneously. During a restart of the beta cluster, the following goroutine was observed, which suggests that we were spending a lot of time GCing replicas on startup. engine ??:0 _Cfunc_DBIterNext(cockroachdb#324, cockroachdb#323, 0, 0, 0, 0, 0, 0, 0) engine rocksdb.go:1135 (*rocksDBIterator).Next(cockroachdb#235) storage replica_data_iter.go:104 (*ReplicaDataIterator).Next(cockroachdb#316) storage store.go:1748 (*Store).destroyReplicaData(#109, cockroachdb#317, 0, 0) storage store.go:841 (*Store).Start.func2(0x101b, cockroachdb#300, 0x36, 0x40, cockroachdb#301, 0x36, 0x40, cockroachdb#315, 0x3, 0x4, ...) storage store.go:734 IterateRangeDescriptors.func1(cockroachdb#306, 0x40, 0x41, cockroachdb#307, 0x92, 0x92, cockroachdb#341, 0, 0x186c, 0x4000, ...) engine mvcc.go:1593 MVCCIterate(cockroachdb#329, #68, #47, #81, cockroachdb#232, 0x9, 0x10, cockroachdb#233, 0xb, 0x10, ...) storage store.go:738 IterateRangeDescriptors(cockroachdb#330, cockroachdb#196, #47, #81, cockroachdb#195, #179, #110) storage store.go:867 (*Store).Start(#109, cockroachdb#330, cockroachdb#196, #179, cockroachdb#185, 0x1) server node.go:405 (*Node).initStores(#78, cockroachdb#330, cockroachdb#196, #98, 0x1, 0x1, #179, 0, #55) server node.go:330 (*Node).start(#78, cockroachdb#330, cockroachdb#196, #42, #129, #98, 0x1, 0x1, 0, 0, ...) server server.go:431 (*Server).Start(#5, cockroachdb#330, cockroachdb#196, #95, 0x1) cli start.go:368 runStart(#34, #178, 0, 0x9, 0, 0) cobra command.go:599 (*Command).execute(#34, #177, 0x9, 0x9, #34, #177) cobra command.go:689 (*Command).ExecuteC(#33, #70, cockroachdb#343, #72) cobra command.go:648 (*Command).Execute(#33, #71, cockroachdb#343) cli cli.go:96 Run(#64, 0xa, 0xa, 0, 0) main main.go:37 main()
For performance reasons, clients may want to carry out read-only operations that aren't concerned with consistency at all. Instead, they may simply want the data present at the replica connected to handed back to them, without going through consensus. Variations on this are possible, but the most straightforward route seems to be:
stale
semantics toRequestHeader
via a booleanstale
flag.stale
flag is found on requests that alter state or within a Transaction.stale
specially: They do not alter the read timestamp cache, and they go straight to a mode of MVCC read that doesn't care about intents or uncertainty restarts but simply reads the first honest value it finds (either starting from the newest value, or from the given read timestamp). This involves carefully going through all the read operations individually, seeing where they converge and sorting out potential oddities. Possible we'll want to pass the 'stale' flag down to the MVCC operations where I think we're only using Scan and Get.TxnCoordSender#sendOne
).We will also want to
The text was updated successfully, but these errors were encountered: