kv: use leaseholder's observed timestamp even for follower reads #56679

nvanbenschoten · 2020-11-13T23:29:14Z

This commit addresses a longstanding TODO to rationalize the use of
observed timestamps during follower reads.

Before this change, we used to use observed timestamps pulled from the
follower node to limit a transaction's uncertainty interval, but this
was incorrect. An observed timestamp pulled from the follower node's
clock has no meaning for the purpose of reducing the transaction's
uncertainty interval. This is because there is no guarantee that at the
time of acquiring the observed timestamp from the follower node, the
leaseholder hadn't already served writes at higher timestamps than the
follower node's clock reflected.

However, if the transaction performing a follower read happens to have
an observed timestamp from the current leaseholder, this timestamp can
be used to reduce the transaction's uncertainty interval. Even though
the read is being served from a different replica in the range, the
observed timestamp still places a bound on the values in the range that
may have been written before the transaction began.

In the past, this was mostly innocuous because AOST txns don't have an
uncertainty interval and the follower read duration was so large that
very few "present time" transactions would ever perform follower reads.
Now that the follower read duration is lower and more and more present
time transactions will perform follower reads, this is more critical to
get right.

The change also reworks the observedts package's docs a little bit to
take advantage of section headers:

cockroach-teamcity · 2020-11-13T23:29:20Z

This change is

This commit addresses a longstanding TODO to rationalize the use of observed timestamps during follower reads. Before this change, we used to use observed timestamps pulled from the follower node to limit a transaction's uncertainty interval, but this was incorrect. An observed timestamp pulled from the follower node's clock has no meaning for the purpose of reducing the transaction's uncertainty interval. This is because there is no guarantee that at the time of acquiring the observed timestamp from the follower node, the leaseholder hadn't already served writes at higher timestamps than the follower node's clock reflected. However, if the transaction performing a follower read happens to have an observed timestamp from the current leaseholder, this timestamp can be used to reduce the transaction's uncertainty interval. Even though the read is being served from a different replica in the range, the observed timestamp still places a bound on the values in the range that may have been written before the transaction began. In the past, this was mostly innocuous because AOST txns don't have an uncertainty interval and the follower read duration was so large that very few "present time" transactions would ever perform follower reads. Now that the follower read duration is lower and more and more present time transactions will perform follower reads, this is more critical to get right. The change also reworks the observedts package's docs a little bit to take advantage of section headers.

tbg

Reviewed 9 of 9 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained

tbg

Thanks for iterating on the docs format, this is starting to look like a solid pattern.

Reviewable status: complete! 0 of 0 LGTMs obtained

nvanbenschoten · 2020-11-16T17:28:06Z

bors r+

craig · 2020-11-16T18:27:41Z

Build failed:

GitHub CI (Cockroach)

nvanbenschoten · 2020-11-16T19:19:55Z

--- done: testdata/logic_test/hash_join_dist with config 5node: 2 tests, 1 failures
    logic.go:2908: 
        testdata/logic_test/hash_join_dist:9: error while processing
    logic.go:2908: pq: cannot up-replicate to s3; missing gossiped StoreDescriptor

Doesn't look related to this change. I stressed the test for 15 minutes and did not hit this issue, though I did eventually hit #56358 (comment). We don't have any other reports of it, so maybe a very rare flake?

bors r+

craig · 2020-11-16T20:41:48Z

Build failed (retrying...):

GitHub CI (Cockroach)

craig · 2020-11-16T21:53:59Z

Build succeeded:

GitHub CI (Cockroach)

nvanbenschoten requested a review from tbg November 13, 2020 23:29

nvanbenschoten force-pushed the nvanbenschoten/observedTSFollower branch from de181c6 to d7d777b Compare November 14, 2020 21:10

tbg approved these changes Nov 16, 2020

View reviewed changes

craig bot merged commit 859be50 into cockroachdb:master Nov 16, 2020

nvanbenschoten deleted the nvanbenschoten/observedTSFollower branch November 23, 2020 02:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kv: use leaseholder's observed timestamp even for follower reads #56679

kv: use leaseholder's observed timestamp even for follower reads #56679

nvanbenschoten commented Nov 13, 2020

cockroach-teamcity commented Nov 13, 2020

tbg left a comment

tbg left a comment

nvanbenschoten commented Nov 16, 2020

craig bot commented Nov 16, 2020

nvanbenschoten commented Nov 16, 2020

craig bot commented Nov 16, 2020

craig bot commented Nov 16, 2020

kv: use leaseholder's observed timestamp even for follower reads #56679

kv: use leaseholder's observed timestamp even for follower reads #56679

Conversation

nvanbenschoten commented Nov 13, 2020

cockroach-teamcity commented Nov 13, 2020

tbg left a comment

Choose a reason for hiding this comment

tbg left a comment

Choose a reason for hiding this comment

nvanbenschoten commented Nov 16, 2020

craig bot commented Nov 16, 2020

nvanbenschoten commented Nov 16, 2020

craig bot commented Nov 16, 2020

craig bot commented Nov 16, 2020