Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: avoid stale read due to observed timestamp after lease change #23749

Closed
tbg opened this issue Mar 12, 2018 · 0 comments
Closed

storage: avoid stale read due to observed timestamp after lease change #23749

tbg opened this issue Mar 12, 2018 · 0 comments
Assignees
Milestone

Comments

@tbg
Copy link
Member

tbg commented Mar 12, 2018

Discovered in #21056 (follower reads).

The fundamental problem is that when we pick up a clock read on a node that we visit, we use this as a guarantee that the node has not served any writes at higher timestamps that we could be missing (as in, that happened causally before us). But this guarantee is broken if the node later becomes leaseholder of some range after we first visited the node (where the data was in fact causally there before our visit, but the node wasn't in charge of it and thus unaware at the time).

For example,

  1. put(k on leaseholder n1), gateway choses t=1.0
  2. begin; read(unrelated key on n2); gateway choses t=0.98
  3. pick up observed timestamp for n2 of 0.99
  4. lease of n1 expires (or gets transferred), n2 takes over
  5. read(k) on leaseholder n2 misses earlier put at t=1s (outside of uncertainty window up to 0.99)

@spencerkimball suggests:

Should be fixable by moving setting the MaxTimestamp from Stores.Send to after we get call Replica.redirectOnOrAcquireLease. MaxTimestamp should be set to the greater of the observed timestamp for the node and the lease start time, bounded by Transaction.MaxTimestamp. This will of course result in more uncertainty restarts, but not many, as lease transfers are not common unless the transaction is really long-running, in which case the original Transaction.MaxTimestamp will control.

@tbg tbg added this to the 2.0 milestone Mar 12, 2018
spencerkimball added a commit to spencerkimball/cockroach that referenced this issue Mar 21, 2018
This change relocates the logic for using observed timestamps to affect
a transaction's `MaxTimestamp` field in order to avoid uncertainty
restarts. This was previously done in `storage.Stores.Send`, but is now
moved to `storage.Replica.executeReadOnlyBatch` and
`storage.Replica.tryExecuteWriteBatch` in order to use information from
the replica's lease to avoiding failing to read a value written in
absolute time before the txn started, but at a higher timestamp than
the timestamp observed at the node which now owns the replica lease.

Fixes cockroachdb#23749

Release note: None
spencerkimball added a commit to spencerkimball/cockroach that referenced this issue Mar 22, 2018
This change relocates the logic for using observed timestamps to affect
a transaction's `MaxTimestamp` field in order to avoid uncertainty
restarts. This was previously done in `storage.Stores.Send`, but is now
moved to `storage.Replica.executeReadOnlyBatch` and
`storage.Replica.tryExecuteWriteBatch` in order to use information from
the replica's lease to avoiding failing to read a value written in
absolute time before the txn started, but at a higher timestamp than
the timestamp observed at the node which now owns the replica lease.

Fixes cockroachdb#23749

Release note: None
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Apr 5, 2019
…holders

This PR is an initial reaction to cockroachdb#36431 and to `limitTxnMaxTimestamp` being
broken for follower reads (PR for that coming soon). To start, we were missing
any real documentation on what makes the observed timestamp mechanism safe and
what invariants need to hold in order for a clock reading on a node to imply an
absence of causality on any values later read at higher timestamp on that node.
The PR addresses this by discussing the invariants necessary for observed
timestamps to be used to bound a transaction's uncertainty window.

In doing so, the PR also attempts to address a second issue, which is that the
documentation conflated a node serving reads and writes with leaseholders
serving reads and writes. At the time that this was all written, these were
equivalent. Now that we have follower reads, the omission of a discussion about
leaseholders made this all hard to think about (e.g. "whose observed timestamp
do we care about when performing a follower read?"). The PR addresses this by
adjusting the wording around nodes and leaseholders. This would have come in
handy in reasoning about cockroachdb#23749.

Release note: None
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Apr 9, 2019
…holders

This PR is an initial reaction to cockroachdb#36431 and to `limitTxnMaxTimestamp` being
broken for follower reads (PR for that coming soon). To start, we were missing
any real documentation on what makes the observed timestamp mechanism safe and
what invariants need to hold in order for a clock reading on a node to imply an
absence of causality on any values later read at higher timestamp on that node.
The PR addresses this by discussing the invariants necessary for observed
timestamps to be used to bound a transaction's uncertainty window.

In doing so, the PR also attempts to address a second issue, which is that the
documentation conflated a node serving reads and writes with leaseholders
serving reads and writes. At the time that this was all written, these were
equivalent. Now that we have follower reads, the omission of a discussion about
leaseholders made this all hard to think about (e.g. "whose observed timestamp
do we care about when performing a follower read?"). The PR addresses this by
adjusting the wording around nodes and leaseholders. This would have come in
handy in reasoning about cockroachdb#23749.

Release note: None
craig bot pushed a commit that referenced this issue Apr 10, 2019
36554: roachpb: formalize observed timestamp preconditions, talk about leaseholders r=nvanbenschoten a=nvanbenschoten

This PR is an initial reaction to #36431 and to `limitTxnMaxTimestamp` being
broken for follower reads (PR for that coming soon). To start, we were missing
any real documentation on what makes the observed timestamp mechanism safe and
what invariants need to hold in order for a clock reading on a node to imply an
absence of causality on any values later read at higher timestamp on that node.
The PR addresses this by discussing the invariants necessary for observed
timestamps to be used to bound a transaction's uncertainty window.

In doing so, the PR also attempts to address a second issue, which is that the
documentation conflated a node serving reads and writes with leaseholders
serving reads and writes. At the time that this was all written, these were
equivalent. Now that we have follower reads, the omission of a discussion about
leaseholders made this all hard to think about (e.g. "whose observed timestamp
do we care about when performing a follower read?"). The PR addresses this by
adjusting the wording around nodes and leaseholders. This would have come in
handy in reasoning about #23749.

Release note: None

Co-authored-by: Nathan VanBenschoten <[email protected]>
craig bot pushed a commit that referenced this issue Apr 10, 2019
36554: roachpb: formalize observed timestamp preconditions, talk about leaseholders r=nvanbenschoten a=nvanbenschoten

This PR is an initial reaction to #36431 and to `limitTxnMaxTimestamp` being
broken for follower reads (PR for that coming soon). To start, we were missing
any real documentation on what makes the observed timestamp mechanism safe and
what invariants need to hold in order for a clock reading on a node to imply an
absence of causality on any values later read at higher timestamp on that node.
The PR addresses this by discussing the invariants necessary for observed
timestamps to be used to bound a transaction's uncertainty window.

In doing so, the PR also attempts to address a second issue, which is that the
documentation conflated a node serving reads and writes with leaseholders
serving reads and writes. At the time that this was all written, these were
equivalent. Now that we have follower reads, the omission of a discussion about
leaseholders made this all hard to think about (e.g. "whose observed timestamp
do we care about when performing a follower read?"). The PR addresses this by
adjusting the wording around nodes and leaseholders. This would have come in
handy in reasoning about #23749.

Release note: None

Co-authored-by: Nathan VanBenschoten <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants