Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: corrupt ReverseScan response after DeleteRangeUsingTombstone #90642

Closed
tbg opened this issue Oct 25, 2022 · 4 comments · Fixed by #90953
Closed

storage: corrupt ReverseScan response after DeleteRangeUsingTombstone #90642

tbg opened this issue Oct 25, 2022 · 4 comments · Fixed by #90953
Assignees
Labels
A-storage Relating to our storage engine (Pebble) on-disk storage. branch-master Failures and bugs on the master branch. branch-release-22.2 Used to mark GA and release blockers, technical advisories, and bugs for 22.2 C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-storage Storage Team

Comments

@tbg
Copy link
Member

tbg commented Oct 25, 2022

Describe the problem

Check out bba9fd0 from #89477

./dev test --test-args '-show-logs' --stress ./pkg/kv/kvnemesis/ --filter TestKVNemesisSingleNode --stress-args='--failure=invalid.checksum'

> /Table/100/"66893d3c4df432d2": invalid checksum (95cae8e4) value [0a 48 0a 10 6c c0 7a 91 50 0a 42 af a2 c7 44 c5 cf bf d1 d4 1a 14 ec 12 36 36 38 39 33 64 33 63 34 64 66 34 33 32 64 32 00 01 2a 0a 08 a0 ed 8d e8 ff fc d8 90 17 30 b5 9b 16 38 01 4a 0a 08 a0 ed 8d e8 ff fc d8 90 17 50 01 12 0c 08 a0 ed 8d e8 ff fc d8 90 17 10 00 18 00 20 0c 28 11 48 01]
I221025 18:39:45.138935 1114 util/ctxgroup/ctxgroup.go:177  [-] 166 +  | (1) /Table/100/"66893d3c4df432d2": invalid checksum (95cae8e4) value [0a 48 0a 10 6c c0 7a 91 50 0a 42 af a2 c7 44 c5 cf bf d1 d4 1a 14 ec 12 36 36 38 39 33 64 33 63 34 64 66 34 33 32 64 32 00 01 2a 0a 08 a0 ed 8d e8 ff fc d8 90 17 30 b5 9b 16 38 01 4a 0a 08 a0 ed 8d e8 ff fc d8 90 17 50 01 12 0c 08 a0 ed 8d e8 ff fc d8 90 17 10 00 18 00 20 0c 28 11 48 01]

To Reproduce

See above. Adding this to cmd_scan_reverse.go triggers it closer to the source, too:

if err := reply.Verify(nil); err != nil {
  panic(err)
}

Note that (*ReverseScanResponse).Verify doesn't use any data from the request, it looks exclusively at reply.Rows. So this is likely a real bug, not some memory handling problem in kvnemesis.

Unfortunately, I can't reproduce under --race.

Expected behavior

Just works

Additional data / screenshots

If DeleteRangeUsingTombstone is disabled in kvnemesis' generator, this problem does not readily reproduce.

Environment:

master, see PR

Additional context

Jira issue: CRDB-20867

Epic CRDB-20465

@tbg tbg added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. GA-blocker T-kv-replication labels Oct 25, 2022
@tbg tbg self-assigned this Oct 25, 2022
@blathers-crl
Copy link

blathers-crl bot commented Oct 25, 2022

Hi @tbg, please add branch-* labels to identify which branch(es) this release-blocker affects.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

@blathers-crl
Copy link

blathers-crl bot commented Oct 25, 2022

cc @cockroachdb/replication

@tbg tbg added branch-master Failures and bugs on the master branch. branch-release-22.2 Used to mark GA and release blockers, technical advisories, and bugs for 22.2 labels Oct 25, 2022
@erikgrinaker
Copy link
Contributor

We've identified the problem. Excerpt from internal Slack thread:

Basically, dynamically enabling point synthesis in the reverse direction may omit a (valid) synthetic point tombstone when it's triggered on a point key colocated with the range tombstone's start key. We emit synthetic points for all range key timestamps at their start key, but in the reverse direction, the parent iterator may only discover the range key when landing on a point key that's above a range key timestamp. Thus, in the reverse direction, the synthetic point was never emitted for the lower range key, so we'll need to step to it.

This is a bit problematic in that we may not know which synthetic point to position on, as it depends on what the previous positioning operation was (it may e.g. have been a versioned SeekLT which would mandate omitting certain synthetic points). Will need some thought.

@erikgrinaker
Copy link
Contributor

erikgrinaker commented Oct 27, 2022

Here's a descriptive test case:

# Regression test for https://github.com/cockroachdb/cockroach/issues/90642.
#
#  REAL DATASET          SYNTHETIC DATASET
#  2    a2 [b2]          2   a2 [b2]
#  1    [---)            1   x
#       a   b                a   b
#
# Recall that pebbleMVCCScanner only enables pointSynthesizingIter when it
# encounters a range key. In the case above, during a reverse scan, the [a-b)@1
# range key will first become visible to pebbleMVCCScanner when it lands on a@2,
# so it enabled point synthesis positioned at the a@2 point key. Notice how the
# iterator has now skipped over the synthetic point tombstone a@1.
#
# This is particularly problematic when combined with pebbleMVCCScanner peeking,
# which assumes that following a iterPeekPrev() call, an iterNext() call can
# step the parent iterator forward once to get back to the original position.
# With the above bug, that is no longer true, as it instead lands on the 
# synthetic point tombstone which was skipped during reverse iteration. During
# intent processing for b@2, such an iterNext() call is expected to land on the
# intent's provisional value at b@2, but it instead lands on the intent itself
# at b@0. This in turn caused a value checksum or decoding failure, where it was
# expecting the current key to be b@2, but the actual key was b@0.
run ok
del_range_ts k=a end=b ts=1
put k=a ts=2 v=a2
with t=A
  txn_begin k=b ts=2
  put k=b v=b2
----
>> at end:
txn: "A" meta={id=00000000 key="b" pri=0.00000000 epo=0 ts=2.000000000,0 min=0,0 seq=0} lock=true stat=PENDING rts=2.000000000,0 wto=false gul=0,0
rangekey: {a-b}/[1.000000000,0=/<empty>]
data: "a"/2.000000000,0 -> /BYTES/a2
meta: "b"/0,0 -> txn={id=00000000 key="b" pri=0.00000000 epo=0 ts=2.000000000,0 min=0,0 seq=0} ts=2.000000000,0 del=false klen=12 vlen=7 mergeTs=<nil> txnDidNotUpdateMeta=true
data: "b"/2.000000000,0 -> /BYTES/b2

run ok
scan t=A k=a end=z reverse
----
scan: "b" -> /<err: unknown tag: UNKNOWN> @0,0
scan: "a" -> /BYTES/a2 @2.000000000,0

@erikgrinaker erikgrinaker assigned erikgrinaker and unassigned tbg Oct 27, 2022
craig bot pushed a commit that referenced this issue Oct 30, 2022
90702: storage: clone `pointSynthesizingIter` seek keys r=erikgrinaker a=erikgrinaker

The seek keys passed to `pointSynthesizingIter` seek methods may have originated from `UnsafeKey()`. Repositioning the parent iterator may therefore invalidate the seek key, but the code keeps using it afterwards.

This patch clones the seek key in a reusable buffer. The overhead of this should be negligible, considering the overall cost of a seek. `MVCCGet` benchmarks don't show a significant difference:

```
name                                                                  old time/op    new time/op    delta
MVCCGet_Pebble/batch=false/versions=1/valueSize=8/numRangeKeys=0-24     5.11µs ± 1%    5.13µs ± 1%    ~     (p=0.143 n=10+10)
MVCCGet_Pebble/batch=false/versions=1/valueSize=8/numRangeKeys=1-24     9.32µs ± 1%    9.41µs ± 1%  +0.96%  (p=0.001 n=10+10)
MVCCGet_Pebble/batch=false/versions=10/valueSize=8/numRangeKeys=0-24    19.1µs ± 2%    19.2µs ± 3%    ~     (p=0.075 n=10+10)
MVCCGet_Pebble/batch=false/versions=10/valueSize=8/numRangeKeys=1-24    25.7µs ± 2%    25.7µs ± 2%    ~     (p=0.853 n=10+10)
MVCCGet_Pebble/batch=true/versions=1/valueSize=8/numRangeKeys=0-24      3.10µs ± 2%    3.09µs ± 1%    ~     (p=0.985 n=10+10)
MVCCGet_Pebble/batch=true/versions=1/valueSize=8/numRangeKeys=1-24      5.05µs ± 0%    5.02µs ± 0%  -0.46%  (p=0.000 n=10+10)
MVCCGet_Pebble/batch=true/versions=10/valueSize=8/numRangeKeys=0-24     16.5µs ± 3%    16.4µs ± 2%    ~     (p=0.579 n=10+10)
MVCCGet_Pebble/batch=true/versions=10/valueSize=8/numRangeKeys=1-24     20.6µs ± 2%    20.7µs ± 1%  +0.60%  (p=0.022 n=9+10)
```

Touches #90642.
Epic: CRDB-2624.

Release note: None

Co-authored-by: Erik Grinaker <[email protected]>
@erikgrinaker erikgrinaker changed the title batcheval: corrupt ReverseScan response after DeleteRangeUsingTombstone storage: corrupt ReverseScan response after DeleteRangeUsingTombstone Oct 31, 2022
@erikgrinaker erikgrinaker added T-storage Storage Team and removed T-kv-replication labels Oct 31, 2022
@blathers-crl blathers-crl bot added the A-storage Relating to our storage engine (Pebble) on-disk storage. label Oct 31, 2022
@craig craig bot closed this as completed in aeed212 Oct 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-storage Relating to our storage engine (Pebble) on-disk storage. branch-master Failures and bugs on the master branch. branch-release-22.2 Used to mark GA and release blockers, technical advisories, and bugs for 22.2 C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-storage Storage Team
Projects
None yet
3 participants