forked from cockroachdb/cockroach
-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
release-22.1: backupccl: introduce BACKUP-LOCK file
Only one backup job is allowed to write to a backup location. Prior to this change, the backup job would rely on the presence of a BACKUP-CHECKPOINT file to know of the presence of a concurrent backup job writing to the same location. This was problematic in subtle ways. In 22.1, we moved backup destination resolution, and the writing of the checkpoint file to the backup resumer. Before writing the checkpoint file we would check if anyone else had laid claim to the loxation. Now, all operations in a job resumer need to be idempotent because a job can be resumed an arbittrary number of times, either due to transient errors or user intervention. One can imagine (and we have seen more than once in recent roachtests) a situation where a job: 1. Checks for other BACKUP-CHECKPOINT files in the location, but finds none. 2. Writes its own BACKUP-CHECKPOINT file. 3. Gets resumed before it gets to update BackupDetails to indicate it has completed 1) and 2). So, when the job repeats 1), it will now see its own BACKUP-CHECKPOINT file and claim another backup is writing to the location, foolishly locking itself out. A similar situation can happen in a mixed version state where the node performs 1) and 2) during planning, and the planner txn retries. Before we discuss the solution it is important to highlight the mixed version states to consider: 1) Backups planned/executed by 21.2.x and 22.1.0 nodes will continue to check BACKUP-CHECKPOINT files before laying claim to a location. 2) Backups planned/executed by 21.2.x and 22.1.0 nodes will continue to write BACKUP-CHECKPOINT files as their way of claiming a location. This change introduces a `BACKUP-LOCK` file that going forward will be used to check and lay claim on a location. The `BACKUP-LOCK` file will be suffixed with the jobID of the backup job. With this change a backup job will check for the existence of `BACKUP-LOCK` files suffixed with a job ID other than their own, before laying claim to a location. We continue to read the BACKUP-CHECKPOINT file so as to respect the claim laid by backups started on older binary nodes. Naturally, the job also continues to write a BACKUP-CHECKPOINT file which prevents older nodes from starting concurrent backups. Fixes: cockroachdb#81808 Release note: None
- Loading branch information
1 parent
9549e03
commit c770ee3
Showing
10 changed files
with
637 additions
and
144 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.