Allow Inconsistent Read Operations #343

tbg · 2015-02-21T11:54:18Z

For performance reasons, clients may want to carry out read-only operations that aren't concerned with consistency at all. Instead, they may simply want the data present at the replica connected to handed back to them, without going through consensus. Variations on this are possible, but the most straightforward route seems to be:

adding optional stale semantics to RequestHeader via a boolean stale flag.
returning an error when the stale flag is found on requests that alter state or within a Transaction.
treating read-only requests tagged as stale specially: They do not alter the read timestamp cache, and they go straight to a mode of MVCC read that doesn't care about intents or uncertainty restarts but simply reads the first honest value it finds (either starting from the newest value, or from the given read timestamp). This involves carefully going through all the read operations individually, seeing where they converge and sorting out potential oddities. Possible we'll want to pass the 'stale' flag down to the MVCC operations where I think we're only using Scan and Get.
making sure that we don't auto-wrap stale reads in a transaction internally when they span across ranges (see TxnCoordSender#sendOne).

We will also want to

use this new type of read for meta[1,2] addressing records to use those stale lookups as suggested by @spencerkimball in Multiraft: leader leases #230 and
check if there are other internal reads where we may want to use this.

The text was updated successfully, but these errors were encountered:

spencerkimball · 2015-02-21T12:57:29Z

Instead of "stale", how about an end value called "readConsistency"? Calling it stale sort of implies it'll be stale when often it won't. Further there are at least three values, though we'll probably only support two for time being. There are:

CONSISTENT
INCONSISTENT

(For now)

We may want to add

CONSENSUS

In order to insist that reads go through consensus instead of relying on leases.

tbg · 2015-02-21T13:50:42Z

Makes sense except that I don't know why you, as a client, would want to explicitly force something through consensus. Even if that were faster in some tricky cases, you would have to be so involved in the internals to know when to use it. What's your reason for wanting to expose it?

spencerkimball · 2015-02-21T14:15:34Z

Like I said, I don't think it's high priority. Except various do implementations have added it as a failsafe when testing against jepsen and the like. You could easily slow down a failed leader's clock if you were being adversarial and convince it it still had a leader lease when in fact it didn't. Forcing through consensus is the gold standard.

tbg · 2015-02-21T14:30:31Z

You're right, it should be in there.

MVCCGet and MVCCScan now take a "consistent" boolean which, if false, ignores intents. Moved the guts of MVCCScan into MVCCIterate and MVCCScan just calls into it. This change allows the removal of the MVCCIterateCommitted method. Provides a unittest for "deadloop" issue of encountering an extant intent at a meta indexing record. Fixes #343 Fixes #435

We have been seeing long startup times which disappear spontaneously. During a restart of the beta cluster, the following goroutine was observed, which suggests that we were spending a lot of time GCing replicas on startup. engine ??:0 _Cfunc_DBIterNext(cockroachdb#324, cockroachdb#323, 0, 0, 0, 0, 0, 0, 0) engine rocksdb.go:1135 (*rocksDBIterator).Next(cockroachdb#235) storage replica_data_iter.go:104 (*ReplicaDataIterator).Next(cockroachdb#316) storage store.go:1748 (*Store).destroyReplicaData(#109, cockroachdb#317, 0, 0) storage store.go:841 (*Store).Start.func2(0x101b, cockroachdb#300, 0x36, 0x40, cockroachdb#301, 0x36, 0x40, cockroachdb#315, 0x3, 0x4, ...) storage store.go:734 IterateRangeDescriptors.func1(cockroachdb#306, 0x40, 0x41, cockroachdb#307, 0x92, 0x92, cockroachdb#341, 0, 0x186c, 0x4000, ...) engine mvcc.go:1593 MVCCIterate(cockroachdb#329, #68, #47, #81, cockroachdb#232, 0x9, 0x10, cockroachdb#233, 0xb, 0x10, ...) storage store.go:738 IterateRangeDescriptors(cockroachdb#330, cockroachdb#196, #47, #81, cockroachdb#195, #179, #110) storage store.go:867 (*Store).Start(#109, cockroachdb#330, cockroachdb#196, #179, cockroachdb#185, 0x1) server node.go:405 (*Node).initStores(#78, cockroachdb#330, cockroachdb#196, #98, 0x1, 0x1, #179, 0, #55) server node.go:330 (*Node).start(#78, cockroachdb#330, cockroachdb#196, #42, #129, #98, 0x1, 0x1, 0, 0, ...) server server.go:431 (*Server).Start(#5, cockroachdb#330, cockroachdb#196, #95, 0x1) cli start.go:368 runStart(#34, #178, 0, 0x9, 0, 0) cobra command.go:599 (*Command).execute(#34, #177, 0x9, 0x9, #34, #177) cobra command.go:689 (*Command).ExecuteC(#33, #70, cockroachdb#343, #72) cobra command.go:648 (*Command).Execute(#33, #71, cockroachdb#343) cli cli.go:96 Run(#64, 0xa, 0xa, 0, 0) main main.go:37 main()

spencerkimball mentioned this issue Mar 19, 2015

deadloop between InternalRangeLookup and InternalPushTxn request #435

Closed

spencerkimball mentioned this issue Mar 21, 2015

Spencerkimball/inconsistent reads #450

Merged

spencerkimball closed this as completed in #450 Mar 31, 2015

jbowens mentioned this issue May 15, 2017

database/sinkdb: new package Onyx-Protocol/Onyx#1192

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow Inconsistent Read Operations #343

Allow Inconsistent Read Operations #343

tbg commented Feb 21, 2015

spencerkimball commented Feb 21, 2015

tbg commented Feb 21, 2015

spencerkimball commented Feb 21, 2015

tbg commented Feb 21, 2015

Allow Inconsistent Read Operations #343

Allow Inconsistent Read Operations #343

Comments

tbg commented Feb 21, 2015

spencerkimball commented Feb 21, 2015

tbg commented Feb 21, 2015

spencerkimball commented Feb 21, 2015

tbg commented Feb 21, 2015