-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
migrations: add explicit API for migrating ingested data later #58611
Comments
Hi @dt, I've guessed the C-ategory of your issue and suitably labeled it. Please re-label if inaccurate. While you're here, please consider adding an A- label to help keep our repository tidy. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
I anticipate that some of the solutions are better left to live in the |
This keeps coming up. I'd like to do something here but I don't quite know what. We have some different versions in the descriptor, but none of them can be used linearly as migrations. Ideally we would serialize something similar to a cluster version into descriptors which would indicate whether or not we've attempted to migrate them. The problem with cluster versions is that we tend to want to throw away those keys (as a forcing function to finish the migrations). I'm commenting here to say that I think we want to do something sooner rather than later and I'm all ears on framework suggestions. |
My first reaction is we should just use the cluster version that is captured at time of backup since 21.1, just the same as we would for migrating a table within the cluster itself. We already build all the migration stuff around cluster versions so starting over with something else seems like it'd need a pretty compelling justification. And yes, that does imply that the lower-bound on old cluster version definitions we have keep in the code has to be the same as the lower-bound of RESTORE backwards compatibility, but I think that'd be mostly true anyway: we'll have to keep the migration code itself, whatever it is keyed on, around anyway as long as we support restoring that backup to do the actual migration of the restored data, so I'm not too worried about keeping the cluster versions. If we really wanted to allow deleting the ones not associated with a migration without breaking RESTORE, I think we could even be clever, where if during RESTORE we can't lookup the version persisted in the backup in out binary, we could still compare it to the ones we have and as long as it is above our minimum restore version, just step it forward until it hits a migration we actually have. |
We now effectively have what it takes to migrate old descriptors to the current cluster version in the form of the |
We have marked this issue as stale because it has been inactive for |
The long-running migration system described here allows performing a migration that may rewrite all row data or all schemas or otherwise perform some migration and then update the cluster version to show that that has been done, which is used as a way to know some invariant now holds, for example that there are no longer interleaved rows to consider or no longer old-format FKs.
However, row data and schema elements can also be ingested into a cluster from a BACKUP using RESTORE. If that backup has captured the pre-migration state, restoring it as-is into the cluster would mean the invariant no longer holds, which could lead to serious bugs if it is assumed elsewhere, based on the cluster version.
In the past, this has been handled in a somewhat ad-hoc and case-by-case basis (or missed in some cases). When the schema team migrated the FK representation they hooked in to points in the backup code to specifically check the FK representation used in backed up tables and modify it as needed. Some other migrations, such as those that updated permissions on all system tables, were written to run once, update every current table, and then never run again, meaning tables restored later would not have the migrated permissions.
To be more robust and less prone to overlooking the potential for pre-migration data to be restored post-migration, we should a) persist the cluster version of the backed up data in the backup and b) step though each migration from that to the current cluster version, calling a new method in the migration definition API that asks that migration implementation if and how it wants to migrate the data being RESTORED. Many migrations -- such as the one to move intents to lock table -- may opt to do nothing, but should do so explicitly to ensure implementors consider restored data in their migration.
Jira issue: CRDB-3378
Epic CRDB-10338
The text was updated successfully, but these errors were encountered: