-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
catalog: enforce uniqueness of id column in system.namespace #84561
Comments
This is more or less what I think we intended to do with the namespace validation, but in practice that doesn't achieve the intended goal. cockroach/pkg/sql/catalog/internal/validate/validate.go Lines 84 to 88 in 2623bae
|
Now that draining names have been removed, we should also be more strict with what is in system.namespace but shouldn't be. This might have helped with https://cockroachlabs.atlassian.net/browse/SREOPS-4907 |
We want the
Presumably, this descriptor will be held in memory by the |
This change fixes a cluster restore bug where backed up non-system descriptors would clobber system tables in the restoring cluster because of an ID collision. As of 22.1 system tables no longer have reserved IDs, and can be dynamically assigned IDs from 50 to 100. When performing a cluster restore we enforce that the restoring cluster is free of any non-system descriptors. We also incorrectly assumed that there will be no collision between the backed up descriptors and the system descriptors in the restoring clusters. These assumptions meant that we do not re-assign any of the backed up descriptors, new IDs, but instead write them to the restoring cluster with the same ID as when they were backed up. In 22.1 we introduced two new system tables. As of this change we have introduced atleast another two system tables, and this trend is expected to continue. Since restore is expected to support restoring a backup taken on an older cluster version into a newer cluster version it is easy to construct a case where IDs 50, 51 and so on were previously occupied by non-system table descriptors. Concretely, in 21.2 ID 50 was occupied by `defaultdb`, and so an attempt to restore a 21.2 backup into a 22.1+ cluster resulted in the backed up `defaultdb` descriptor clobbering the newly introduced system table at ID 50. Apart from having more validation around what a restore is allowed to write to the `system.descriptors` and `system.namespace` table (cockroachdb#84561), we need to ensure that users are able to restore their older backups. To this effect, we add another step to the restore planning phase that does the following: 1. Collects all table descriptors in the restoring clusters `system` database. 2. Bumps the restoring clusters descriptor ID generator to a value higher than any descriptor in the backup. 3. For every table in the `system` database that collides with a backed up descriptor, restore will re-create the `system` table using the updated ID generator. Thereby guaranteeing that it will not collide with any of the descriptors we are about to restore. 4. Allow the cluster restore to progress as before. Note, this is meant to serve as a relatively short term fix until we desgin a better solution around cluster restores and dynamic system tables. The only system tables that a cluster restore can conflict with today are: - `tenant_settings`: system tenant only - `span_count`: non-system tenant only - `privileges`: system and non-system tenant Release note (bug fix): fixes a bug where cluster restores of older backups would silently clobber system tables or fail to complete.
This change fixes a cluster restore bug where backed up non-system descriptors would clobber system tables in the restoring cluster because of an ID collision. As of 22.1 system tables no longer have reserved IDs, and can be dynamically assigned IDs from 50 to 100. When performing a cluster restore we enforce that the restoring cluster is free of any non-system descriptors. We also incorrectly assumed that there will be no collision between the backed up descriptors and the system descriptors in the restoring clusters. These assumptions meant that we do not re-assign any of the backed up descriptors, new IDs, but instead write them to the restoring cluster with the same ID as when they were backed up. In 22.1 we introduced two new system tables. As of this change we have introduced atleast another two system tables, and this trend is expected to continue. Since restore is expected to support restoring a backup taken on an older cluster version into a newer cluster version it is easy to construct a case where IDs 50, 51 and so on were previously occupied by non-system table descriptors. Concretely, in 21.2 ID 50 was occupied by `defaultdb`, and so an attempt to restore a 21.2 backup into a 22.1+ cluster resulted in the backed up `defaultdb` descriptor clobbering the newly introduced system table at ID 50. Apart from having more validation around what a restore is allowed to write to the `system.descriptors` and `system.namespace` table (cockroachdb#84561), we need to ensure that users are able to restore their older backups. To this effect, we add another step to the restore planning phase that does the following: 1. Collects all table descriptors in the restoring clusters `system` database. 2. Bumps the restoring clusters descriptor ID generator to a value higher than any descriptor in the backup. 3. For every table in the `system` database that collides with a backed up descriptor, restore will re-create the `system` table using the updated ID generator. Thereby guaranteeing that it will not collide with any of the descriptors we are about to restore. 4. Allow the cluster restore to progress as before. Note, this is meant to serve as a relatively short term fix until we desgin a better solution around cluster restores and dynamic system tables. The only system tables that a cluster restore can conflict with today are: - `tenant_settings`: system tenant only - `span_count`: non-system tenant only - `privileges`: system and non-system tenant Release note (bug fix): fixes a bug where cluster restores of older backups would silently clobber system tables or fail to complete.
84495: backupccl: move conflicting system table descriptors to higher IDs r=adityamaru a=stevendanna This change fixes a cluster restore bug where backed up non-system descriptors would clobber system tables in the restoring cluster because of an ID collision. As of 22.1 system tables no longer have reserved IDs, and can be dynamically assigned IDs from 50 to 100. When performing a cluster restore we enforce that the restoring cluster is free of any non-system descriptors. We also incorrectly assumed that there will be no collision between the backed up descriptors and the system descriptors in the restoring clusters. These assumptions meant that we do not re-assign any of the backed up descriptors, new IDs, but instead write them to the restoring cluster with the same ID as when they were backed up. In 22.1 we introduced two new system tables. As of this change we have introduced atleast another two system tables, and this trend is expected to continue. Since restore is expected to support restoring a backup taken on an older cluster version into a newer cluster version it is easy to construct a case where IDs 50, 51 and so on were previously occupied by non-system table descriptors. Concretely, in 21.2 ID 50 was occupied by `defaultdb`, and so an attempt to restore a 21.2 backup into a 22.1+ cluster resulted in the backed up `defaultdb` descriptor clobbering the newly introduced system table at ID 50. Apart from having more validation around what a restore is allowed to write to the `system.descriptors` and `system.namespace` table (#84561), we need to ensure that users are able to restore their older backups. To this effect, we add another step to the restore planning phase that does the following: 1. Collects all table descriptors in the restoring clusters `system` database. 2. Bumps the restoring clusters descriptor ID generator to a value higher than any descriptor in the backup. 3. For every table in the `system` database that collides with a backed up descriptor, restore will re-create the `system` table using the updated ID generator. Thereby guaranteeing that it will not collide with any of the descriptors we are about to restore. 4. Allow the cluster restore to progress as before. Note, this is meant to serve as a relatively short term fix until we desgin a better solution around cluster restores and dynamic system tables. The only system tables that a cluster restore can conflict with today are: - `tenant_settings`: system tenant only - `span_count`: non-system tenant only - `privileges`: system and non-system tenant Release note (bug fix): fixes a bug where cluster restores of older backups would silently clobber system tables or fail to complete. Co-authored-by: Aditya Maru <[email protected]>
This change fixes a cluster restore bug where backed up non-system descriptors would clobber system tables in the restoring cluster because of an ID collision. As of 22.1 system tables no longer have reserved IDs, and can be dynamically assigned IDs from 50 to 100. When performing a cluster restore we enforce that the restoring cluster is free of any non-system descriptors. We also incorrectly assumed that there will be no collision between the backed up descriptors and the system descriptors in the restoring clusters. These assumptions meant that we do not re-assign any of the backed up descriptors, new IDs, but instead write them to the restoring cluster with the same ID as when they were backed up. In 22.1 we introduced two new system tables. As of this change we have introduced atleast another two system tables, and this trend is expected to continue. Since restore is expected to support restoring a backup taken on an older cluster version into a newer cluster version it is easy to construct a case where IDs 50, 51 and so on were previously occupied by non-system table descriptors. Concretely, in 21.2 ID 50 was occupied by `defaultdb`, and so an attempt to restore a 21.2 backup into a 22.1+ cluster resulted in the backed up `defaultdb` descriptor clobbering the newly introduced system table at ID 50. Apart from having more validation around what a restore is allowed to write to the `system.descriptors` and `system.namespace` table (cockroachdb#84561), we need to ensure that users are able to restore their older backups. To this effect, we add another step to the restore planning phase that does the following: 1. Collects all table descriptors in the restoring clusters `system` database. 2. Bumps the restoring clusters descriptor ID generator to a value higher than any descriptor in the backup. 3. For every table in the `system` database that collides with a backed up descriptor, restore will re-create the `system` table using the updated ID generator. Thereby guaranteeing that it will not collide with any of the descriptors we are about to restore. 4. Allow the cluster restore to progress as before. Note, this is meant to serve as a relatively short term fix until we desgin a better solution around cluster restores and dynamic system tables. The only system tables that a cluster restore can conflict with today are: - `tenant_settings`: system tenant only - `span_count`: non-system tenant only - `privileges`: system and non-system tenant Release note (bug fix): fixes a bug where cluster restores of older backups would silently clobber system tables or fail to complete.
@Xiang-Gu expressed an interest in taking this on. |
One milestone of
will be achieved when PR #88844 is merged. This change will tighten the logic whenever we write to the For now, I'll step away and focus my time on some other development work for the next release. |
Such a validation check would prevent nasty bugs from happening. Now that draining names are no longer a thing, nothing prevents us from doing this. We can't do this by adding a unique constraint because we happily write into that table directly using KV-level Puts so we're going to have to be a bit more clever about it.
Jira issue: CRDB-17737
The text was updated successfully, but these errors were encountered: