Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

migrations: add explicit API for migrating ingested data later #58611

Closed
dt opened this issue Jan 7, 2021 · 6 comments
Closed

migrations: add explicit API for migrating ingested data later #58611

dt opened this issue Jan 7, 2021 · 6 comments
Labels
A-disaster-recovery A-schema-descriptors Relating to SQL table/db descriptor handling. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) no-issue-activity T-disaster-recovery X-stale

Comments

@dt
Copy link
Member

dt commented Jan 7, 2021

The long-running migration system described here allows performing a migration that may rewrite all row data or all schemas or otherwise perform some migration and then update the cluster version to show that that has been done, which is used as a way to know some invariant now holds, for example that there are no longer interleaved rows to consider or no longer old-format FKs.

However, row data and schema elements can also be ingested into a cluster from a BACKUP using RESTORE. If that backup has captured the pre-migration state, restoring it as-is into the cluster would mean the invariant no longer holds, which could lead to serious bugs if it is assumed elsewhere, based on the cluster version.

In the past, this has been handled in a somewhat ad-hoc and case-by-case basis (or missed in some cases). When the schema team migrated the FK representation they hooked in to points in the backup code to specifically check the FK representation used in backed up tables and modify it as needed. Some other migrations, such as those that updated permissions on all system tables, were written to run once, update every current table, and then never run again, meaning tables restored later would not have the migrated permissions.

To be more robust and less prone to overlooking the potential for pre-migration data to be restored post-migration, we should a) persist the cluster version of the backed up data in the backup and b) step though each migration from that to the current cluster version, calling a new method in the migration definition API that asks that migration implementation if and how it wants to migrate the data being RESTORED. Many migrations -- such as the one to move intents to lock table -- may opt to do nothing, but should do so explicitly to ensure implementors consider restored data in their migration.

Jira issue: CRDB-3378

Epic CRDB-10338

@blathers-crl
Copy link

blathers-crl bot commented Jan 7, 2021

Hi @dt, I've guessed the C-ategory of your issue and suitably labeled it. Please re-label if inaccurate.

While you're here, please consider adding an A- label to help keep our repository tidy.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

@blathers-crl blathers-crl bot added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Jan 7, 2021
@dt dt added A-disaster-recovery A-schema-descriptors Relating to SQL table/db descriptor handling. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) and removed C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. labels Jan 7, 2021
@ajwerner
Copy link
Contributor

ajwerner commented Jan 7, 2021

I anticipate that some of the solutions are better left to live in the RESTORE layer explicitly. Perhaps we can cleanly interface-ify everything. I think that a large class of migrations will involve verifying properties of and potentially rewriting descriptors. An interesting (though maybe independently complex) case is the removal of cross-database references.

@ajwerner
Copy link
Contributor

This keeps coming up. I'd like to do something here but I don't quite know what. We have some different versions in the descriptor, but none of them can be used linearly as migrations. Ideally we would serialize something similar to a cluster version into descriptors which would indicate whether or not we've attempted to migrate them. The problem with cluster versions is that we tend to want to throw away those keys (as a forcing function to finish the migrations).

I'm commenting here to say that I think we want to do something sooner rather than later and I'm all ears on framework suggestions.

@dt
Copy link
Member Author

dt commented May 31, 2021

My first reaction is we should just use the cluster version that is captured at time of backup since 21.1, just the same as we would for migrating a table within the cluster itself. We already build all the migration stuff around cluster versions so starting over with something else seems like it'd need a pretty compelling justification.

And yes, that does imply that the lower-bound on old cluster version definitions we have keep in the code has to be the same as the lower-bound of RESTORE backwards compatibility, but I think that'd be mostly true anyway: we'll have to keep the migration code itself, whatever it is keyed on, around anyway as long as we support restoring that backup to do the actual migration of the restored data, so I'm not too worried about keeping the cluster versions.

If we really wanted to allow deleting the ones not associated with a migration without breaking RESTORE, I think we could even be clever, where if during RESTORE we can't lookup the version persisted in the backup in out binary, we could still compare it to the ones we have and as long as it is above our minimum restore version, just step it forward until it hits a migration we actually have.

@dt dt removed their assignment Jun 1, 2021
@jlinder jlinder added the T-sql-schema-deprecated Use T-sql-foundations instead label Jun 16, 2021
@exalate-issue-sync exalate-issue-sync bot removed the T-sql-schema-deprecated Use T-sql-foundations instead label Jul 23, 2021
@postamar
Copy link
Contributor

postamar commented Mar 8, 2022

We now effectively have what it takes to migrate old descriptors to the current cluster version in the form of the RunPostDeserializationChanges and especially the RunRestoreChanges methods in the descriptor builders. These methods already get called in all the right places during a RESTORE. In light of this I'm going to remove this issue from our board. Feel free to re-add it if I misunderstood what this is about.

@github-actions
Copy link

We have marked this issue as stale because it has been inactive for
18 months. If this issue is still relevant, removing the stale label
or adding a comment will keep it active. Otherwise, we'll close it in
10 days to keep the issue queue tidy. Thank you for your contribution
to CockroachDB!

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Oct 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-disaster-recovery A-schema-descriptors Relating to SQL table/db descriptor handling. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) no-issue-activity T-disaster-recovery X-stale
Projects
No open projects
Archived in project
Development

No branches or pull requests

5 participants