Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

importer: add ImportStartWallTime table descriptor field #85852

Merged

Conversation

msbutler
Copy link
Collaborator

@msbutler msbutler commented Aug 9, 2022

This patch introduces the new ImportStartWallTime table descriptor field which
provides the time at which an in-progress import began writing data to disk.

In future PRs, this field will be used to:

  • incrementally back up in progress imports, preventing a large BACKUP workload
    after an Import finishes.
  • elide importing keys in RESTORE, ensuring a table -- with an in-progress import
    in the back up-- contains no in-progress importing keys and is available on
    the restored cluster.

Informs #85138

Release note: none

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@msbutler msbutler changed the title Butler descriptor import start time importer: add ImportStartTime descriptor Aug 9, 2022
@msbutler msbutler self-assigned this Aug 9, 2022
@msbutler msbutler changed the title importer: add ImportStartTime descriptor importer: add ImportStartWallTime descriptor Aug 9, 2022
@msbutler msbutler requested a review from dt August 9, 2022 21:45
@msbutler msbutler marked this pull request as ready for review August 9, 2022 21:45
@msbutler msbutler requested a review from a team August 9, 2022 21:45
@msbutler msbutler requested a review from a team as a code owner August 9, 2022 21:45
@msbutler msbutler requested a review from a team August 9, 2022 21:45
@msbutler msbutler force-pushed the butler-descriptor-import-start-time branch 3 times, most recently from 324d337 to e2108ea Compare August 9, 2022 22:40
@msbutler msbutler changed the title importer: add ImportStartWallTime descriptor importer: add ImportStartWallTime table descriptor field Aug 10, 2022
@msbutler msbutler force-pushed the butler-descriptor-import-start-time branch 5 times, most recently from 66edc1b to 740d8ef Compare August 11, 2022 16:51
@msbutler
Copy link
Collaborator Author

msbutler commented Aug 11, 2022

@dt I added an ImportType field to the table descriptor. I need this ensure that IMPORT PGDUMP descriptors remain offline once the RESTORE completes.

@msbutler msbutler force-pushed the butler-descriptor-import-start-time branch 2 times, most recently from af632e5 to 1368086 Compare August 11, 2022 18:53
@msbutler msbutler mentioned this pull request Aug 11, 2022
@msbutler msbutler force-pushed the butler-descriptor-import-start-time branch 2 times, most recently from 9e4a62c to f5a79db Compare August 12, 2022 11:48
@dt
Copy link
Member

dt commented Aug 12, 2022

@msbutler Can't we just only set the import wall time field to a non-zero number for non-new imports? And then all other imports would not be restored?

@msbutler
Copy link
Collaborator Author

msbutler commented Aug 12, 2022

My motivation for adding the ImportType is that during the restore job, i need to distinguish an IMPORT PGDUMP/MYSQL from any kind of IMPORT INTO, because for the former, I need to keep its table descriptor offline.

For the IMPORT PGDUMP case, instead of adding the ImportType, I considered setting desc.ImportStartWalltime to a special time that will never get hit by IMPORT INTO, like 1. But that seemed a bit confusing for the caller. Happy to go that route if you think that's better/simpler.

@msbutler
Copy link
Collaborator Author

You know what, I'm going to scrap the ImportType. I like the idea of using the ImportWallTime with a special value for IMPORT PGDUMP. It simplifies things.

@dt
Copy link
Member

dt commented Aug 12, 2022

The special value is just... zero, right? And then we only restore offline descriptors if they have a non-zero value, and we elide all rows at/above that value?

@dt
Copy link
Member

dt commented Aug 12, 2022

an IMPORT PGDUMP/MYSQL ... I need to keep its table descriptor offline.

We just don't restore the desc at all here, right? we don't want to keep it offline since there is no job that is going to come along and remove it later.

Basically, it seems like this should be a) if importing into an existing table -- empty or not --, record the time that all import writes will be at/above in the desc b) when restoring, if a table is offline we do not restore the desc or data unless it has a import-writes-are-at-or-above time, in which case we do restore it but elide all rows at/above that time, and the table ends up public at the end of restore just like all the others.

Does that sound right?

@msbutler
Copy link
Collaborator Author

msbutler commented Aug 12, 2022

Ah yes, for an in-progress IMPORT PGDUMP, we really should drop all of its associated descriptors on RESTORE.

One piece you're missing: I only backup offline tables iff its ImportWallTime > 0. If we kept Import PGDUMP's wall time == 0, then we can't incrementally back it up. Hence, it needs another special value, perhaps -1.

I wonder if it's worth only focusing on the IMPORT INTO case for now, and revisit incrementally backing up IMPORT PGDUMP in stability/23.1.

@msbutler msbutler force-pushed the butler-descriptor-import-start-time branch from f5a79db to 8722d84 Compare August 12, 2022 15:47
@msbutler
Copy link
Collaborator Author

removed the import type. Currently nothing gets set for IMPORT PGDUMP. Would like to merge this ASAP to make it easier to work on downstream stuff.

@msbutler msbutler force-pushed the butler-descriptor-import-start-time branch from 8722d84 to c1a3b60 Compare August 12, 2022 16:02
This patch introduces the new ImportStartWallTime table descriptor field which
provides the time at which an in-progress import began writing data to disk.
This field is nonzero if the table is offline during an import.

In future PRs, this field will be used to:
- incrementally back up in progress imports, preventing a large BACKUP workload
 after an Import finishes.
- elide importing keys in RESTORE, ensuring a table -- with an in-progress import
  in the back up-- contains no in-progress importing keys and is available on
  the restored cluster.

Informs cockroachdb#85138

Release note: none
@msbutler msbutler force-pushed the butler-descriptor-import-start-time branch from c1a3b60 to 5c16ad4 Compare August 12, 2022 16:53
@msbutler
Copy link
Collaborator Author

TFTR!

bors r=dt

@craig
Copy link
Contributor

craig bot commented Aug 12, 2022

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Aug 13, 2022

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Aug 13, 2022

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Aug 13, 2022

Build succeeded:

@craig craig bot merged commit dd59982 into cockroachdb:master Aug 13, 2022
@msbutler msbutler deleted the butler-descriptor-import-start-time branch August 13, 2022 15:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants