-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: fix point synthesis activation during reverse scans #90953
storage: fix point synthesis activation during reverse scans #90953
Conversation
#90959 must be merged first, since the test case here will trigger assertion failures due to that bug under certain metamorphic configurations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(drive by comment since I've not been keeping up with the range key details in the storage package)
Why do we need to see a@1
, given that it is below a@2
? I suppose it is because of the comment:
// PointSynthesizingIter wraps an MVCCIterator, and synthesizes MVCC point keys
// for MVCC range keys above existing point keys (not below), and at the start
// of range keys (truncated to iterator bounds).
So start of range keys are somehow special in that we synthesize there regardless of the whether it is below or not? If yes, why is that?
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jbowens and @tbg)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to synthesize at the start key for FailOnMoreRecent
handling, in case there is a conflicting range key which doesn't overlap any points in the relevant span. As outlined in the commit body, three alternatives were considered (and even attempted in #90779 and elsewhere):
-
Don't synthesize below existing points if they overlap the start key: we'll still have the same problem with bare range keys.
-
Don't synthesize at the start key of bare range keys at all: violates
FailOnMoreRecent
, and the iterator still needs to know which direction to step in when enabling point synthesis positioned on a bare range key. -
Only synthesize the top-most version of bare range keys: both fixes the bug here and satisfies
FailOnMoreRecent
handling, but the semantics are rather odd (e.g. lower tombstones will disappear when a new one is written above).
I decided to just do the simple and ugly thing here. But we should consider getting rid of pointSynthesizingIter
and integrating the logic into pebbleMVCCScanner
(unless the complexity blowup proves too bad), which would also allow us to simplify the semantics by not synthesizing tombstones at start keys at all -- the scanner could just check the bare range key timestamps against FailOnMoreRecent
without emitting them.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jbowens and @tbg)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 3 files at r1.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @tbg)
7ae0861
to
0cdad57
Compare
Release note: None
This patch fixes a bug where `pointSynthesizingIter` could appear to have skipped a synthetic point tombstone in the reverse direction when enabled dynamically by `pebbleMVCCScanner`. Consider the following case: ``` REAL DATASET SYNTHETIC DATASET 3 [b3] 3 [b3] 2 a2 2 a2 1 [---) 1 x a b a b ``` Recall that `pebbleMVCCScanner` only enables `pointSynthesizingIter` when it encounters a range key. In the case above, during a reverse scan, the `[a-b)@1` range key will first become visible to `pebbleMVCCScanner` when it lands on `a@2`, so it enabled point synthesis positioned at the `a@2` point key. Notice how the iterator has now skipped over the synthetic point tombstone `a@1`. This is particularly problematic when combined with `pebbleMVCCScanner` peeking, which assumes that following a `iterPeekPrev()` call, an `iterNext()` call can step the parent iterator forward once to get back to the original position. With the above bug, that is no longer true, as it instead lands on the synthetic point tombstone which was skipped during reverse iteration. During intent processing for `b@3`, such an `iterNext()` call is expected to land on the intent's provisional value at `b@3`, but it instead lands on the intent itself at `b@0`. This in turn caused a value checksum or decoding failure, where it was expecting the current key to be `b@3`, but the actual key was `b@0`. This patch fixes the bug by keeping track of the direction of the previous positioning operation in `pebbleMVCCScanner`, and then repositioning the `pointSynthesizingIter` as appropriate during dynamic activation in the reverse direction. This doesn't have a significant effect on performance (in particular, the point get path is never in reverse). Several alternative approaches were attempted but abandoned: * Don't synthesize points at a range key's start bound. While compelling for a range of reasons (complexity, performance, etc) this has two flaws: on a bare range key, the iterator still needs to know which direction to skip in, and we also need to know the timestamps of bare range keys for `FailOnMoreRecent` handling. * Only synthesize a single point at a range key's start bound. This would solve the problem at hand, but has rather strange semantics. Longer-term, it may be better to merge range key logic into `pebbleMVCCScanner` itself -- `pointSynthesizingIter` was intended to reduce complexity, but it's not clear that the overall complexity is actually reduced. Release note: None
0cdad57
to
51d70e8
Compare
CI failures are unrelated. bors r+ |
Build failed (retrying...): |
Build failed (retrying...): |
Build succeeded: |
Encountered an error creating backports. Some common things that can go wrong:
You might need to create your backport manually using the backport tool. error creating merge commit from 5217021 to blathers/backport-release-22.2-90953: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict [] you may need to manually resolve merge conflicts with the backport tool. Backport to branch 22.2.x failed. See errors above. error creating merge commit from 5217021 to blathers/backport-release-22.2.0-90953: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict [] you may need to manually resolve merge conflicts with the backport tool. Backport to branch 22.2.0 failed. See errors above. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
This patch fixes a bug where
pointSynthesizingIter
could appear to have skipped a synthetic point tombstone in the reverse direction when enabled dynamically bypebbleMVCCScanner
.Consider the following case:
Recall that
pebbleMVCCScanner
only enablespointSynthesizingIter
when it encounters a range key. In the case above, during a reverse scan, the[a-b)@1
range key will first become visible topebbleMVCCScanner
when it lands ona@2
, so it enabled point synthesis positioned at thea@2
point key. Notice how the iterator has now skipped over the synthetic point tombstonea@1
.This is particularly problematic when combined with
pebbleMVCCScanner
peeking, which assumes that following aiterPeekPrev()
call, aniterNext()
call can step the parent iterator forward once to get back to the original position. With the above bug, that is no longer true, as it instead lands on the synthetic point tombstone which was skipped during reverse iteration. During intent processing forb@3
, such aniterNext()
call is expected to land on the intent's provisional value atb@3
, but it instead lands on the intent itself atb@0
. This in turn caused a value checksum or decoding failure, where it was expecting the current key to beb@3
, but the actual key wasb@0
.This patch fixes the bug by keeping track of the direction of the previous positioning operation in
pebbleMVCCScanner
, and then repositioning thepointSynthesizingIter
as appropriate during dynamic activation in the reverse direction. This doesn't have a significant effect on performance (in particular, the point get path is never in reverse).Several alternative approaches were attempted but abandoned:
Don't synthesize points at a range key's start bound. While compelling for a range of reasons (complexity, performance, etc) this has two flaws: on a bare range key, the iterator still needs to know which direction to skip in, and we also need to know the timestamps of bare range keys for
FailOnMoreRecent
handling.Only synthesize a single point at a range key's start bound. This would solve the problem at hand, but has rather strange semantics.
Longer-term, it may be better to merge range key logic into
pebbleMVCCScanner
itself --pointSynthesizingIter
was intended to reduce complexity, but it's not clear that the overall complexity is actually reduced.Resolves #90642.
Release note: None