Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow Inconsistent Read Operations #343

Closed
tbg opened this issue Feb 21, 2015 · 4 comments · Fixed by #450
Closed

Allow Inconsistent Read Operations #343

tbg opened this issue Feb 21, 2015 · 4 comments · Fixed by #450

Comments

@tbg
Copy link
Member

tbg commented Feb 21, 2015

For performance reasons, clients may want to carry out read-only operations that aren't concerned with consistency at all. Instead, they may simply want the data present at the replica connected to handed back to them, without going through consensus. Variations on this are possible, but the most straightforward route seems to be:

  • adding optional stale semantics to RequestHeader via a boolean stale flag.
  • returning an error when the stale flag is found on requests that alter state or within a Transaction.
  • treating read-only requests tagged as stale specially: They do not alter the read timestamp cache, and they go straight to a mode of MVCC read that doesn't care about intents or uncertainty restarts but simply reads the first honest value it finds (either starting from the newest value, or from the given read timestamp). This involves carefully going through all the read operations individually, seeing where they converge and sorting out potential oddities. Possible we'll want to pass the 'stale' flag down to the MVCC operations where I think we're only using Scan and Get.
  • making sure that we don't auto-wrap stale reads in a transaction internally when they span across ranges (see TxnCoordSender#sendOne).

We will also want to

  • use this new type of read for meta[1,2] addressing records to use those stale lookups as suggested by @spencerkimball in Multiraft: leader leases #230 and
  • check if there are other internal reads where we may want to use this.
@spencerkimball
Copy link
Member

Instead of "stale", how about an end value called "readConsistency"? Calling it stale sort of implies it'll be stale when often it won't. Further there are at least three values, though we'll probably only support two for time being. There are:

CONSISTENT
INCONSISTENT

(For now)

We may want to add

CONSENSUS

In order to insist that reads go through consensus instead of relying on leases.

@tbg
Copy link
Member Author

tbg commented Feb 21, 2015

Makes sense except that I don't know why you, as a client, would want to explicitly force something through consensus. Even if that were faster in some tricky cases, you would have to be so involved in the internals to know when to use it. What's your reason for wanting to expose it?

@spencerkimball
Copy link
Member

Like I said, I don't think it's high priority. Except various do implementations have added it as a failsafe when testing against jepsen and the like. You could easily slow down a failed leader's clock if you were being adversarial and convince it it still had a leader lease when in fact it didn't. Forcing through consensus is the gold standard.

@tbg
Copy link
Member Author

tbg commented Feb 21, 2015

You're right, it should be in there.

spencerkimball added a commit that referenced this issue Mar 21, 2015
MVCCGet and MVCCScan now take a "consistent" boolean which, if false,
ignores intents. Moved the guts of MVCCScan into MVCCIterate and MVCCScan
just calls into it. This change allows the removal of the MVCCIterateCommitted
method.

Provides a unittest for "deadloop" issue of encountering an extant intent
at a meta indexing record.

Fixes #343
Fixes #435
spencerkimball added a commit that referenced this issue Mar 31, 2015
MVCCGet and MVCCScan now take a "consistent" boolean which, if false,
ignores intents. Moved the guts of MVCCScan into MVCCIterate and MVCCScan
just calls into it. This change allows the removal of the MVCCIterateCommitted
method.

Provides a unittest for "deadloop" issue of encountering an extant intent
at a meta indexing record.

Fixes #343
Fixes #435
spencerkimball added a commit that referenced this issue Mar 31, 2015
MVCCGet and MVCCScan now take a "consistent" boolean which, if false,
ignores intents. Moved the guts of MVCCScan into MVCCIterate and MVCCScan
just calls into it. This change allows the removal of the MVCCIterateCommitted
method.

Provides a unittest for "deadloop" issue of encountering an extant intent
at a meta indexing record.

Fixes #343
Fixes #435
tbg added a commit to tbg/cockroach that referenced this issue Aug 31, 2016
We have been seeing long startup times which disappear spontaneously. During a
restart of the beta cluster, the following goroutine was observed, which suggests
that we were spending a lot of time GCing replicas on startup.

    engine              ??:0                     _Cfunc_DBIterNext(cockroachdb#324, cockroachdb#323, 0, 0, 0, 0, 0, 0, 0)
    engine              rocksdb.go:1135          (*rocksDBIterator).Next(cockroachdb#235)
    storage             replica_data_iter.go:104 (*ReplicaDataIterator).Next(cockroachdb#316)
    storage             store.go:1748            (*Store).destroyReplicaData(#109, cockroachdb#317, 0, 0)
    storage             store.go:841             (*Store).Start.func2(0x101b, cockroachdb#300, 0x36, 0x40, cockroachdb#301, 0x36, 0x40, cockroachdb#315, 0x3, 0x4, ...)
    storage             store.go:734             IterateRangeDescriptors.func1(cockroachdb#306, 0x40, 0x41, cockroachdb#307, 0x92, 0x92, cockroachdb#341, 0, 0x186c, 0x4000, ...)
    engine              mvcc.go:1593             MVCCIterate(cockroachdb#329, #68, #47, #81, cockroachdb#232, 0x9, 0x10, cockroachdb#233, 0xb, 0x10, ...)
    storage             store.go:738             IterateRangeDescriptors(cockroachdb#330, cockroachdb#196, #47, #81, cockroachdb#195, #179, #110)
    storage             store.go:867             (*Store).Start(#109, cockroachdb#330, cockroachdb#196, #179, cockroachdb#185, 0x1)
    server              node.go:405              (*Node).initStores(#78, cockroachdb#330, cockroachdb#196, #98, 0x1, 0x1, #179, 0, #55)
    server              node.go:330              (*Node).start(#78, cockroachdb#330, cockroachdb#196, #42, #129, #98, 0x1, 0x1, 0, 0, ...)
    server              server.go:431            (*Server).Start(#5, cockroachdb#330, cockroachdb#196, #95, 0x1)
    cli                 start.go:368             runStart(#34, #178, 0, 0x9, 0, 0)
    cobra               command.go:599           (*Command).execute(#34, #177, 0x9, 0x9, #34, #177)
    cobra               command.go:689           (*Command).ExecuteC(#33, #70, cockroachdb#343, #72)
    cobra               command.go:648           (*Command).Execute(#33, #71, cockroachdb#343)
    cli                 cli.go:96                Run(#64, 0xa, 0xa, 0, 0)
    main                main.go:37               main()
tbg added a commit to tbg/cockroach that referenced this issue Aug 31, 2016
We have been seeing long startup times which disappear spontaneously. During a
restart of the beta cluster, the following goroutine was observed, which suggests
that we were spending a lot of time GCing replicas on startup.

    engine              ??:0                     _Cfunc_DBIterNext(cockroachdb#324, cockroachdb#323, 0, 0, 0, 0, 0, 0, 0)
    engine              rocksdb.go:1135          (*rocksDBIterator).Next(cockroachdb#235)
    storage             replica_data_iter.go:104 (*ReplicaDataIterator).Next(cockroachdb#316)
    storage             store.go:1748            (*Store).destroyReplicaData(#109, cockroachdb#317, 0, 0)
    storage             store.go:841             (*Store).Start.func2(0x101b, cockroachdb#300, 0x36, 0x40, cockroachdb#301, 0x36, 0x40, cockroachdb#315, 0x3, 0x4, ...)
    storage             store.go:734             IterateRangeDescriptors.func1(cockroachdb#306, 0x40, 0x41, cockroachdb#307, 0x92, 0x92, cockroachdb#341, 0, 0x186c, 0x4000, ...)
    engine              mvcc.go:1593             MVCCIterate(cockroachdb#329, #68, #47, #81, cockroachdb#232, 0x9, 0x10, cockroachdb#233, 0xb, 0x10, ...)
    storage             store.go:738             IterateRangeDescriptors(cockroachdb#330, cockroachdb#196, #47, #81, cockroachdb#195, #179, #110)
    storage             store.go:867             (*Store).Start(#109, cockroachdb#330, cockroachdb#196, #179, cockroachdb#185, 0x1)
    server              node.go:405              (*Node).initStores(#78, cockroachdb#330, cockroachdb#196, #98, 0x1, 0x1, #179, 0, #55)
    server              node.go:330              (*Node).start(#78, cockroachdb#330, cockroachdb#196, #42, #129, #98, 0x1, 0x1, 0, 0, ...)
    server              server.go:431            (*Server).Start(#5, cockroachdb#330, cockroachdb#196, #95, 0x1)
    cli                 start.go:368             runStart(#34, #178, 0, 0x9, 0, 0)
    cobra               command.go:599           (*Command).execute(#34, #177, 0x9, 0x9, #34, #177)
    cobra               command.go:689           (*Command).ExecuteC(#33, #70, cockroachdb#343, #72)
    cobra               command.go:648           (*Command).Execute(#33, #71, cockroachdb#343)
    cli                 cli.go:96                Run(#64, 0xa, 0xa, 0, 0)
    main                main.go:37               main()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants