Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

backupccl: incrementally backup in progress imports on existing tables and elide importing data in RESTORE #86054

Closed
msbutler opened this issue Aug 12, 2022 · 2 comments · Fixed by #85920
Assignees
Labels
A-disaster-recovery branch-release-22.2 Used to mark GA and release blockers, technical advisories, and bugs for 22.2 C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. T-disaster-recovery

Comments

@msbutler
Copy link
Collaborator

msbutler commented Aug 12, 2022

Currently, a table with an in-progress cannot get backed up, let-alone restored to its pre-import state. Backing up an in-progress import also has the benefit of distributing the work of backing up the import over a series of incremental backups, as oppose to what currently occurs: the first incremental backup to begin after the import finishes has to back up everything.

To address this, main challenge involves rolling back imported data on the restored cluster. To understand why this is a challenge, consider how rollbacks occur today:

If an IMPORT writing data into an existing, non-empty cluster fails or is cancelled mid-IMPORT, to roll it back, any rows it had written are found and deleted by scanning the table for rows with a timestamp greater than the time at which the IMPORT started. This works since the table is offline to other writes while it is importing, but relies on the fact that the timestamps on rows do not change -- which may not be true if the table were backed up and then restored, after which all keys, both existing and imported, would have times based on when it was restored.

The second paragraph in #76722 outlined one strategy which involved writing additional metadata to each imported key, and indeed several PRs began implementing this approach (#85338, #85692 #85138). However, we realized that binding the Import Start Time in the backed up table descriptor is sufficient. Specifically, when the restore_data_processor rewrites backed up keys to the restore cluster, it can use the ImportStartTime in the restored table descriptor to filter out keys in the backed up, in-progress import, before AddSSTable rewrites the timestamps of all the keys.

Note: the more complicated approach outlined in #76722 would have been necessary if RESTOREs of whole tenants implemented MVCC AddSSTable-- i.e. rewrote timestamps in RESTORE-- because during the restore, the host tenant cannot access tenant table descriptors and thus filter keys in the restore processor. And indeed, we thought it was necessary to make whole tenant RESTOREs MVCC compatible. But now, we no longer think that whole tenant operations (like tenant streaming) need to be MVCC, since it's relatively easy to ensure that all downstream operations understand that whole tenant operations are non-MVCC. So given that whole tenant restores will continue to preserve timestamps from the backup, the restored tenant can rollback their import using the normal process described in the second paragraph.

Jira issue: CRDB-18546

@msbutler msbutler added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-disaster-recovery T-disaster-recovery labels Aug 12, 2022
@msbutler msbutler self-assigned this Aug 12, 2022
@blathers-crl
Copy link

blathers-crl bot commented Aug 12, 2022

cc @cockroachdb/bulk-io

msbutler added a commit to msbutler/cockroach that referenced this issue Aug 17, 2022
Informs: cockroachdb#86054

Release note (sql change): offline tables with in-progress IMPORT INTO imports
(i.e. tables that existed before the import job began) now get
backed up once the cluster has upgraded to v22.2. During a cluster/database
Restore that executes AOST during the in-progress import, all data from the
import is elided and leaves the table in its pre-import state (online, with
pre-import data, if any). If the RESTORE executes AOST after the import
completes, all the imported data ingests into the restored data.

Future related work includes:
- allowing a user to explicitly target an importing table (e.g. run
RESTORE TABLE my_importing_table)

Release justification: low risk, high benefit change
@msbutler msbutler added the release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. label Aug 26, 2022
@blathers-crl
Copy link

blathers-crl bot commented Aug 26, 2022

Hi @msbutler, please add branch-* labels to identify which branch(es) this release-blocker affects.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

@msbutler msbutler added the branch-release-22.2 Used to mark GA and release blockers, technical advisories, and bugs for 22.2 label Aug 26, 2022
@craig craig bot closed this as completed in #85920 Aug 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-disaster-recovery branch-release-22.2 Used to mark GA and release blockers, technical advisories, and bugs for 22.2 C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. T-disaster-recovery
Projects
No open projects
Archived in project
Development

Successfully merging a pull request may close this issue.

1 participant