backupccl: incrementally backup in progress imports on existing tables and elide importing data in RESTORE #86054
Labels
A-disaster-recovery
branch-release-22.2
Used to mark GA and release blockers, technical advisories, and bugs for 22.2
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
release-blocker
Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked.
T-disaster-recovery
Currently, a table with an in-progress cannot get backed up, let-alone restored to its pre-import state. Backing up an in-progress import also has the benefit of distributing the work of backing up the import over a series of incremental backups, as oppose to what currently occurs: the first incremental backup to begin after the import finishes has to back up everything.
To address this, main challenge involves rolling back imported data on the restored cluster. To understand why this is a challenge, consider how rollbacks occur today:
If an IMPORT writing data into an existing, non-empty cluster fails or is cancelled mid-IMPORT, to roll it back, any rows it had written are found and deleted by scanning the table for rows with a timestamp greater than the time at which the IMPORT started. This works since the table is offline to other writes while it is importing, but relies on the fact that the timestamps on rows do not change -- which may not be true if the table were backed up and then restored, after which all keys, both existing and imported, would have times based on when it was restored.
The second paragraph in #76722 outlined one strategy which involved writing additional metadata to each imported key, and indeed several PRs began implementing this approach (#85338, #85692 #85138). However, we realized that binding the Import Start Time in the backed up table descriptor is sufficient. Specifically, when the
restore_data_processor
rewrites backed up keys to the restore cluster, it can use the ImportStartTime in the restored table descriptor to filter out keys in the backed up, in-progress import, before AddSSTable rewrites the timestamps of all the keys.Note: the more complicated approach outlined in #76722 would have been necessary if RESTOREs of whole tenants implemented MVCC AddSSTable-- i.e. rewrote timestamps in RESTORE-- because during the restore, the host tenant cannot access tenant table descriptors and thus filter keys in the restore processor. And indeed, we thought it was necessary to make whole tenant RESTOREs MVCC compatible. But now, we no longer think that whole tenant operations (like tenant streaming) need to be MVCC, since it's relatively easy to ensure that all downstream operations understand that whole tenant operations are non-MVCC. So given that whole tenant restores will continue to preserve timestamps from the backup, the restored tenant can rollback their import using the normal process described in the second paragraph.
Jira issue: CRDB-18546
The text was updated successfully, but these errors were encountered: