kvserver: properly handle DeleteRange in batches that hit WriteTooOld errors #71236

AlexTalks · 2021-10-06T19:24:11Z

Currently, batches that have DeleteRange may read a different set of keys to delete if they have subsequent blind writes that hit a WriteTooOld error (and leave intents behind). If this happens and one of the keys that need to be deleted at the refreshed timestamp already has an intent on it, we won't be able to replay the batch as an intent that should be generated by the DeleteRange won't be present in the intent history.

This manifests in errors such as:

transaction b81df504-9bf5-4e2b-9a94-f9b19cee01ab with sequence 2 missing an intent with lower sequence 1

An example batch that could hit this error is below, assuming the Put operation hits a WriteTooOld error:

b := &Batch{}
b.DelRange(ctx, /Table/50/"006f9bc1", /Table/50/"32de5dba", true)
b.Put(ctx, /Table/50/"130f043b", v-1)
txn.Run(ctx, b)

This was discovered upon adding DeleteRange to kvnemesis in #68003

Jira issue: CRDB-10452

The text was updated successfully, but these errors were encountered:

nvanbenschoten · 2021-10-29T18:14:47Z

@AlexTalks and I discussed what to do about this. KV batches are never used in a way that hits this currently, and the error already means that this won't introduce silent corruption, so there's not a lot of urgency to support this. Instead, we decided to update the error message to point at this issue as a possible suspect and then place the resolution of this on hold.

We also decided that when updating the error message, we'll update the errors in replayTransactionalWrite from errors.Errorf to errors.AssertionFailedf.

mwang1026 · 2021-12-01T18:55:02Z

If the plan is to update an error message, can we pick that up quickly and close this out?

Enhance the error message on sequence number errors when replaying a transactional batch with a link to the possible cause, cockroachdb#71236, stemming from an issue where a `DelRange` operation finds new keys to delete upon replay. This also changes the error from a generic error to an `AssertionFailed` error. Release note: None

AlexTalks · 2021-12-06T05:52:17Z

If the plan is to update an error message, can we pick that up quickly and close this out?

The update to the error message is in #73496, however I believe that @nvanbenschoten and I discussed leaving this issue open as a reference to the actual issue here (which is that DelRange operations in co-mingled batches are not supported), and linking here in the error message. If it's okay to close this out, I'll remove the issue link and we can do that however.

Enhance the error message on sequence number errors when replaying a transactional batch with a link to the possible cause, cockroachdb#71236, stemming from an issue where a `DelRange` operation finds new keys to delete upon replay. This also changes the error from a generic error to an `AssertionFailed` error. Release note: None

73496: storage: add issue for sequence number errors on replaying `DelRange` r=AlexTalks a=AlexTalks Enhance the error message on sequence number errors when replaying a transactional batch with a link to the possible cause, #71236, stemming from an issue where a `DelRange` operation finds new keys to delete upon replay. This also changes the error from a generic error to an `AssertionFailed` error. Release note: None 73578: storage: remove leftover logic related to interleaved intents r=sumeerbhola a=nvanbenschoten This commit is a follow-up to #72536. It addresses a few of the remaining items left over from removing the bulk of the interleaved intent logic. Specifically, it removes: - the `PrecedingIntentState` type - the `PrecedingIntentState` parameter in `Writer.ClearIntent` - the `Writer.OverrideTxnDidNotUpdateMetaToFalse` method - the `txnDidNotUpdateMetaHelper` type The commit does not include any behavioral changes. 73591: ui: show per-node series for "Read Amplification" and "SSTables" graphs r=dhartunian a=nvanbenschoten This commit addresses a longstanding usability issue with the Storage dashboard. Previously, the dashboard would show the average read amplification and the average sstable count across the cluster. When looking at these metrics, we are specifically interested in the outliers, so this made little sense. As a result, a few of our runbooks (e.g. [RocksDB inverted LSM](https://cockroachlabs.atlassian.net/wiki/spaces/TS/pages/1157890147/RocksDB+inverted+LSM)) require operators to grab custom graphs with the "Per Node" option. This commit fixes this by splitting these graphs out to show per-node series. _Example:_ <img width="1132" alt="Screen Shot 2021-12-07 at 10 24 14 PM" src="https://user-images.githubusercontent.com/5438456/145142909-0babdd04-54a6-46d3-9d4e-002a2d375811.png"> Co-authored-by: Alex Sarkesian <[email protected]> Co-authored-by: Nathan VanBenschoten <[email protected]>

This shouldn't have an effect because we still zero out `config.Ops.Batch` owing to cockroachdb#46081. See also cockroachdb#71236. Release note: None

github-actions · 2023-08-24T11:10:27Z

We have marked this issue as stale because it has been inactive for
18 months. If this issue is still relevant, removing the stale label
or adding a comment will keep it active. Otherwise, we'll close it in
10 days to keep the issue queue tidy. Thank you for your contribution
to CockroachDB!

AlexTalks added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-kv KV Team labels Oct 6, 2021

mwang1026 assigned AlexTalks Dec 1, 2021

AlexTalks mentioned this issue Dec 6, 2021

storage: add issue for sequence number errors on replaying DelRange #73496

Merged

tbg mentioned this issue Aug 29, 2022

kvnemesis: add MVCC range tombstone tests #76435

Closed

tbg added a commit to tbg/cockroach that referenced this issue Sep 20, 2022

kvnemesis: use synthetic transaction for batches

1f3b669

This shouldn't have an effect because we still zero out `config.Ops.Batch` owing to cockroachdb#46081. See also cockroachdb#71236. Release note: None

github-actions bot added the no-issue-activity label Aug 24, 2023

nvanbenschoten removed the no-issue-activity label Aug 25, 2023

exalate-issue-sync bot unassigned AlexTalks May 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kvserver: properly handle DeleteRange in batches that hit WriteTooOld errors #71236

kvserver: properly handle DeleteRange in batches that hit WriteTooOld errors #71236

AlexTalks commented Oct 6, 2021 •

edited by cockroach-jira-scripts

Loading

nvanbenschoten commented Oct 29, 2021

mwang1026 commented Dec 1, 2021

AlexTalks commented Dec 6, 2021

github-actions bot commented Aug 24, 2023

kvserver: properly handle DeleteRange in batches that hit WriteTooOld errors #71236

kvserver: properly handle DeleteRange in batches that hit WriteTooOld errors #71236

Comments

AlexTalks commented Oct 6, 2021 • edited by cockroach-jira-scripts Loading

nvanbenschoten commented Oct 29, 2021

mwang1026 commented Dec 1, 2021

AlexTalks commented Dec 6, 2021

github-actions bot commented Aug 24, 2023

AlexTalks commented Oct 6, 2021 •

edited by cockroach-jira-scripts

Loading