Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: fix point synthesis activation during reverse scans #90953

Merged
merged 2 commits into from
Oct 31, 2022

Conversation

erikgrinaker
Copy link
Contributor

This patch fixes a bug where pointSynthesizingIter could appear to have skipped a synthetic point tombstone in the reverse direction when enabled dynamically by pebbleMVCCScanner.

Consider the following case:

REAL DATASET          SYNTHETIC DATASET
3       [b3]          3      [b3]
2    a2               2   a2
1    [---)            1   x
     a   b                a   b

Recall that pebbleMVCCScanner only enables pointSynthesizingIter when it encounters a range key. In the case above, during a reverse scan, the [a-b)@1 range key will first become visible to pebbleMVCCScanner when it lands on a@2, so it enabled point synthesis positioned at the a@2 point key. Notice how the iterator has now skipped over the synthetic point tombstone a@1.

This is particularly problematic when combined with pebbleMVCCScanner peeking, which assumes that following a iterPeekPrev() call, an iterNext() call can step the parent iterator forward once to get back to the original position. With the above bug, that is no longer true, as it instead lands on the synthetic point tombstone which was skipped during reverse iteration. During intent processing for b@3, such an iterNext() call is expected to land on the intent's provisional value at b@3, but it instead lands on the intent itself at b@0. This in turn caused a value checksum or decoding failure, where it was expecting the current key to be b@3, but the actual key was b@0.

This patch fixes the bug by keeping track of the direction of the previous positioning operation in pebbleMVCCScanner, and then repositioning the pointSynthesizingIter as appropriate during dynamic activation in the reverse direction. This doesn't have a significant effect on performance (in particular, the point get path is never in reverse).

Several alternative approaches were attempted but abandoned:

  • Don't synthesize points at a range key's start bound. While compelling for a range of reasons (complexity, performance, etc) this has two flaws: on a bare range key, the iterator still needs to know which direction to skip in, and we also need to know the timestamps of bare range keys for FailOnMoreRecent handling.

  • Only synthesize a single point at a range key's start bound. This would solve the problem at hand, but has rather strange semantics.

Longer-term, it may be better to merge range key logic into pebbleMVCCScanner itself -- pointSynthesizingIter was intended to reduce complexity, but it's not clear that the overall complexity is actually reduced.

Resolves #90642.

Release note: None

@erikgrinaker erikgrinaker requested review from jbowens, tbg and a team October 30, 2022 14:22
@erikgrinaker erikgrinaker requested a review from a team as a code owner October 30, 2022 14:22
@erikgrinaker erikgrinaker self-assigned this Oct 30, 2022
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@erikgrinaker
Copy link
Contributor Author

#90959 must be merged first, since the test case here will trigger assertion failures due to that bug under certain metamorphic configurations.

Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(drive by comment since I've not been keeping up with the range key details in the storage package)

Why do we need to see a@1, given that it is below a@2? I suppose it is because of the comment:

// PointSynthesizingIter wraps an MVCCIterator, and synthesizes MVCC point keys
// for MVCC range keys above existing point keys (not below), and at the start
// of range keys (truncated to iterator bounds).

So start of range keys are somehow special in that we synthesize there regardless of the whether it is below or not? If yes, why is that?

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jbowens and @tbg)

Copy link
Contributor Author

@erikgrinaker erikgrinaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to synthesize at the start key for FailOnMoreRecent handling, in case there is a conflicting range key which doesn't overlap any points in the relevant span. As outlined in the commit body, three alternatives were considered (and even attempted in #90779 and elsewhere):

  • Don't synthesize below existing points if they overlap the start key: we'll still have the same problem with bare range keys.

  • Don't synthesize at the start key of bare range keys at all: violates FailOnMoreRecent, and the iterator still needs to know which direction to step in when enabling point synthesis positioned on a bare range key.

  • Only synthesize the top-most version of bare range keys: both fixes the bug here and satisfies FailOnMoreRecent handling, but the semantics are rather odd (e.g. lower tombstones will disappear when a new one is written above).

I decided to just do the simple and ugly thing here. But we should consider getting rid of pointSynthesizingIter and integrating the logic into pebbleMVCCScanner (unless the complexity blowup proves too bad), which would also allow us to simplify the semantics by not synthesizing tombstones at start keys at all -- the scanner could just check the bare range key timestamps against FailOnMoreRecent without emitting them.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jbowens and @tbg)

Copy link
Collaborator

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 3 of 3 files at r1.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @tbg)

This patch fixes a bug where `pointSynthesizingIter` could appear to
have skipped a synthetic point tombstone in the reverse direction when
enabled dynamically by `pebbleMVCCScanner`.

Consider the following case:

```
REAL DATASET          SYNTHETIC DATASET
3       [b3]          3      [b3]
2    a2               2   a2
1    [---)            1   x
     a   b                a   b
```

Recall that `pebbleMVCCScanner` only enables `pointSynthesizingIter`
when it encounters a range key. In the case above, during a reverse
scan, the `[a-b)@1` range key will first become visible to
`pebbleMVCCScanner` when it lands on `a@2`, so it enabled point
synthesis positioned at the `a@2` point key. Notice how the iterator has
now skipped over the synthetic point tombstone `a@1`.

This is particularly problematic when combined with `pebbleMVCCScanner`
peeking, which assumes that following a `iterPeekPrev()` call, an
`iterNext()` call can step the parent iterator forward once to get back
to the original position.  With the above bug, that is no longer true,
as it instead lands on the synthetic point tombstone which was skipped
during reverse iteration. During intent processing for `b@3`, such an
`iterNext()` call is expected to land on the intent's provisional value
at `b@3`, but it instead lands on the intent itself at `b@0`. This in
turn caused a value checksum or decoding failure, where it was expecting
the current key to be `b@3`, but the actual key was `b@0`.

This patch fixes the bug by keeping track of the direction of the
previous positioning operation in `pebbleMVCCScanner`, and then
repositioning the `pointSynthesizingIter` as appropriate during dynamic
activation in the reverse direction. This doesn't have a significant
effect on performance (in particular, the point get path is never in
reverse).

Several alternative approaches were attempted but abandoned:

* Don't synthesize points at a range key's start bound. While compelling
  for a range of reasons (complexity, performance, etc) this has two
  flaws: on a bare range key, the iterator still needs to know which
  direction to skip in, and we also need to know the timestamps of bare
  range keys for `FailOnMoreRecent` handling.

* Only synthesize a single point at a range key's start bound. This
  would solve the problem at hand, but has rather strange semantics.

Longer-term, it may be better to merge range key logic into
`pebbleMVCCScanner` itself -- `pointSynthesizingIter` was intended to
reduce complexity, but it's not clear that the overall complexity is
actually reduced.

Release note: None
@erikgrinaker erikgrinaker force-pushed the point-synthesis-direction branch from 0cdad57 to 51d70e8 Compare October 31, 2022 18:03
@erikgrinaker
Copy link
Contributor Author

CI failures are unrelated.

bors r+

@craig
Copy link
Contributor

craig bot commented Oct 31, 2022

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Oct 31, 2022

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Oct 31, 2022

Build succeeded:

@craig craig bot merged commit aeed212 into cockroachdb:master Oct 31, 2022
@blathers-crl
Copy link

blathers-crl bot commented Oct 31, 2022

Encountered an error creating backports. Some common things that can go wrong:

  1. The backport branch might have already existed.
  2. There was a merge conflict.
  3. The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.


error creating merge commit from 5217021 to blathers/backport-release-22.2-90953: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 22.2.x failed. See errors above.


error creating merge commit from 5217021 to blathers/backport-release-22.2.0-90953: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 22.2.0 failed. See errors above.


🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

storage: corrupt ReverseScan response after DeleteRangeUsingTombstone
4 participants