migrations: add explicit API for migrating ingested data later #58611

dt · 2021-01-07T13:55:59Z

The long-running migration system described here allows performing a migration that may rewrite all row data or all schemas or otherwise perform some migration and then update the cluster version to show that that has been done, which is used as a way to know some invariant now holds, for example that there are no longer interleaved rows to consider or no longer old-format FKs.

However, row data and schema elements can also be ingested into a cluster from a BACKUP using RESTORE. If that backup has captured the pre-migration state, restoring it as-is into the cluster would mean the invariant no longer holds, which could lead to serious bugs if it is assumed elsewhere, based on the cluster version.

In the past, this has been handled in a somewhat ad-hoc and case-by-case basis (or missed in some cases). When the schema team migrated the FK representation they hooked in to points in the backup code to specifically check the FK representation used in backed up tables and modify it as needed. Some other migrations, such as those that updated permissions on all system tables, were written to run once, update every current table, and then never run again, meaning tables restored later would not have the migrated permissions.

To be more robust and less prone to overlooking the potential for pre-migration data to be restored post-migration, we should a) persist the cluster version of the backed up data in the backup and b) step though each migration from that to the current cluster version, calling a new method in the migration definition API that asks that migration implementation if and how it wants to migrate the data being RESTORED. Many migrations -- such as the one to move intents to lock table -- may opt to do nothing, but should do so explicitly to ensure implementors consider restored data in their migration.

Jira issue: CRDB-3378

Epic CRDB-10338

blathers-crl · 2021-01-07T13:56:02Z

Hi @dt, I've guessed the C-ategory of your issue and suitably labeled it. Please re-label if inaccurate.

While you're here, please consider adding an A- label to help keep our repository tidy.

_{🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.}

ajwerner · 2021-01-07T14:26:56Z

I anticipate that some of the solutions are better left to live in the RESTORE layer explicitly. Perhaps we can cleanly interface-ify everything. I think that a large class of migrations will involve verifying properties of and potentially rewriting descriptors. An interesting (though maybe independently complex) case is the removal of cross-database references.

ajwerner · 2021-05-30T21:58:04Z

This keeps coming up. I'd like to do something here but I don't quite know what. We have some different versions in the descriptor, but none of them can be used linearly as migrations. Ideally we would serialize something similar to a cluster version into descriptors which would indicate whether or not we've attempted to migrate them. The problem with cluster versions is that we tend to want to throw away those keys (as a forcing function to finish the migrations).

I'm commenting here to say that I think we want to do something sooner rather than later and I'm all ears on framework suggestions.

dt · 2021-05-31T23:34:06Z

My first reaction is we should just use the cluster version that is captured at time of backup since 21.1, just the same as we would for migrating a table within the cluster itself. We already build all the migration stuff around cluster versions so starting over with something else seems like it'd need a pretty compelling justification.

And yes, that does imply that the lower-bound on old cluster version definitions we have keep in the code has to be the same as the lower-bound of RESTORE backwards compatibility, but I think that'd be mostly true anyway: we'll have to keep the migration code itself, whatever it is keyed on, around anyway as long as we support restoring that backup to do the actual migration of the restored data, so I'm not too worried about keeping the cluster versions.

If we really wanted to allow deleting the ones not associated with a migration without breaking RESTORE, I think we could even be clever, where if during RESTORE we can't lookup the version persisted in the backup in out binary, we could still compare it to the ones we have and as long as it is above our minimum restore version, just step it forward until it hits a migration we actually have.

postamar · 2022-03-08T21:57:26Z

We now effectively have what it takes to migrate old descriptors to the current cluster version in the form of the RunPostDeserializationChanges and especially the RunRestoreChanges methods in the descriptor builders. These methods already get called in all the right places during a RESTORE. In light of this I'm going to remove this issue from our board. Feel free to re-add it if I misunderstood what this is about.

github-actions · 2023-09-21T11:04:58Z

We have marked this issue as stale because it has been inactive for
18 months. If this issue is still relevant, removing the stale label
or adding a comment will keep it active. Otherwise, we'll close it in
10 days to keep the issue queue tidy. Thank you for your contribution
to CockroachDB!

blathers-crl bot added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Jan 7, 2021

mwang1026 assigned dt Jan 12, 2021

adityamaru mentioned this issue Mar 11, 2021

Update sequences to be stored by ID when doing backup/restore #60942

Closed

RichardJCai mentioned this issue May 11, 2021

sql: create long running migration to fix privilege descriptors for tables/db created on 20.1 or before #65012

Closed

mwang1026 added the T-disaster-recovery label May 19, 2021

dt removed their assignment Jun 1, 2021

ajwerner mentioned this issue Jun 9, 2021

sql/catalog: validate that old foreign key representation is not in use #66262

Closed

jlinder added the T-sql-schema-deprecated Use T-sql-foundations instead label Jun 16, 2021

exalate-issue-sync bot removed the T-sql-schema-deprecated Use T-sql-foundations instead label Jul 23, 2021

github-actions bot added the no-issue-activity label Sep 21, 2023

github-actions bot added the X-stale label Oct 2, 2023

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Oct 2, 2023

exalate-issue-sync bot closed this as completed Oct 2, 2023

github-project-automation bot added this to Disaster Recovery Backlog Aug 28, 2024

github-project-automation bot moved this to Done in Disaster Recovery Backlog Aug 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

migrations: add explicit API for migrating ingested data later #58611

migrations: add explicit API for migrating ingested data later #58611

dt commented Jan 7, 2021 •

edited by exalate-issue-sync bot

Loading

blathers-crl bot commented Jan 7, 2021

ajwerner commented Jan 7, 2021

ajwerner commented May 30, 2021

dt commented May 31, 2021 •

edited

Loading

postamar commented Mar 8, 2022

github-actions bot commented Sep 21, 2023

migrations: add explicit API for migrating ingested data later #58611

migrations: add explicit API for migrating ingested data later #58611

Comments

dt commented Jan 7, 2021 • edited by exalate-issue-sync bot Loading

blathers-crl bot commented Jan 7, 2021

ajwerner commented Jan 7, 2021

ajwerner commented May 30, 2021

dt commented May 31, 2021 • edited Loading

postamar commented Mar 8, 2022

github-actions bot commented Sep 21, 2023

dt commented Jan 7, 2021 •

edited by exalate-issue-sync bot

Loading

dt commented May 31, 2021 •

edited

Loading