-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
importer: add ImportEpoch table descriptor field #85692
importer: add ImportEpoch table descriptor field #85692
Conversation
I realize the unit test gets called within the bulk data driven driver. Once I address the failures in #85637, I'll move the unit test to |
a2186c3
to
786692e
Compare
7b637fa
to
7ca826c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good at a high level, but this is sufficiently far away from KV that someone more familiar with it should review this too.
I'm assuming this will piggyback on the version gate in #85138, so there are no further cross-version concerns here?
Note: i think I need to rework this PR such that two import epoch's are stored in the table descriptor. Will update later today. |
NextImportEpoch -- counter that increments every time to made sure epochs are unique |
7ca826c
to
0813fbc
Compare
RFAL
|
0813fbc
to
bb6705f
Compare
…ields This patch introduces the new ImportEpoch table descriptor field which provides a count for the number of IMPORT jobs that have been attempted on the table and the ImportTypeInProgress enum which indicates the kind of Import occurring. This work is apart of a larger project to roll back imports without MVCC timestamps. In a future pr (cockroachdb#85138), this epoch will get written to the imported MVCCValueHeader metadata, which in turn allows import rollbacks to use the epoch to determine which keys to delete in the table. We considered binding the import job id to the table descriptor, and in turn to each imported KV's metadata, however, the expected proto encoded size of a job ID was between 7 - 12 bytes, which seemed uncessary. Meanwhile the 4 byte uint32 epoch will encode to 1 byte for the first 256 IMPORT jobs on the table. 2,147,483,648 IMPORT jobs must occur on the table before we worry about integer overflow -- a problem for a future release :D. Informs cockroachdb#85138 Release note: none
bb6705f
to
98afb06
Compare
sorry for the noise -- RFAL! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll still defer to Bulk/Schema here, but this seems fine as far as I can tell.
I'm assuming we already have some sort of mutual exclusion protocol here to prevent two import jobs from clobbering each others import state?
@@ -1180,7 +1180,23 @@ message TableDescriptor { | |||
// this table, in which case the global setting is used. | |||
optional bool forecast_stats = 52 [(gogoproto.nullable) = true, (gogoproto.customname) = "ForecastStats"]; | |||
|
|||
// Next ID: 54 | |||
// ImportIntoEpoch is the count of IMPORT INTO jobs that have been attempted on this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// ImportIntoEpoch is the count of IMPORT INTO jobs that have been attempted on this | |
// ImportEpoch is the count of IMPORT INTO jobs that have been attempted on this |
// field gets incremented while preparing the table for ingestion. | ||
optional uint32 import_epoch = 54 [(gogoproto.nullable) = false, (gogoproto.customname) = "ImportEpoch"]; | ||
|
||
// ImportType indicates the kind of import currently occurring on the table. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could explain the differences. Why is IMPORT
a separate thing?
closing in favor of #85852 For 22.2, we decided to continue rolling back imports with timestamps, so the ImportEpoch field is no longer necessary. |
This patch introduces the new ImportEpoch table descriptor field which
provides a count for the number of IMPORT INTO jobs that have been attempted on
the table after it already had data, including any in-progress jobs.
This work is apart of a larger project to roll back imports without MVCC
timestamps. In a future pr (#85138), this epoch will get written to the
imported MVCCValueHeader metadata, which in turn allows import rollbacks to
use the epoch to determine which keys to delete in the table.
We considered binding the import job id to the table descriptor, and in turn
to each imported KV's metadata, however, the expected proto encoded size of a
job ID was between 7 - 12 bytes, which seemed uncessary. Meanwhile the 4 byte
uint32 epoch will encode to 1 byte for the first 256 IMPORT INTO jobs on the
table. 2,147,483,648 IMPORT INTO jobs must occur on the table before we worry
about integer overflow -- a problem for a future release :D.
Informs #76722
Release note: none