Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

backupccl: default to separated incremental_from for collections #75201

Closed
2 tasks
stevendanna opened this issue Jan 20, 2022 · 2 comments
Closed
2 tasks

backupccl: default to separated incremental_from for collections #75201

stevendanna opened this issue Jan 20, 2022 · 2 comments
Assignees
Labels
A-disaster-recovery C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-disaster-recovery

Comments

@stevendanna
Copy link
Collaborator

stevendanna commented Jan 20, 2022

In #72236, we added the ability to specify an incremental_storage option during BACKUP. When this option is present, incremental backups are stored in the specified directory rather than in the collection itself.

We would like to be able to take advantage of this capability by default inside a backup collection.

Strawman requirements to be changed:

  • In a collection where the latest backup is a full backup, an incremental backup added via BACKUP ... INTO LATEST INwill be stored in an /incrementals subdirectory/subprefix inside that collection, unless otherwise specified.

  • In a collection where the latest backup is an incremental backup, an incremental backup added via BACKUP INTO LATEST IN will be stored directly in the collection.

  • BACKUP ... INTO '{subdirectory}' IN '{destination}' will continue to work as it does currently.

  • RESTORE FROM {subdir} IN {destination} without an explicitly incremental_storage option should search the appropriate directory outside of subdir (either the parent or some other directory specified by our default scheme) for full backups.

Questions

  • Should this be behind a cluster setting in case people have automation built around the old layout? We've said in the past that we don't want people depending on a particular layout.

  • Should full backups also get their own subdirectory by default? I believe that without this, you could never have a lifecycle policy for full backups that didn't also apply to incrementals on S3. But, since most people likely want to retain incrementals for a shorter period than full backups, perhaps this isn't much of an issue. That said a layout like:

cockroach-data/extern/t10/
|-- full
|  `-- 2021
|      `-- 12
|          `-- 15-131632.00
| -- incrementals
|   `-- 2021
|       `-- 12
|           `-- 15-131636.88
`-- LATEST

does have a nice look to it to me.

Jira issue: CRDB-12510

@stevendanna stevendanna added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Jan 20, 2022
@blathers-crl
Copy link

blathers-crl bot commented Feb 24, 2022

cc @cockroachdb/bulk-io

@benbardin
Copy link
Collaborator

Completed in #75970.

Behavior of full backups is unchanged. It didn't feel super-helpful to add a new directory full that would only ever contain a single directory, and the complexity from supporting backwards compatibility seemed substantial. But we can revisit this later if we feel it would be more clear.

Though this wasn't listed in the initial spec here, I also made this change to the old BACKUP TO syntax. The functionality itself didn't seem important there, but the code was easier to manage this way and it seemed to do no harm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-disaster-recovery C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-disaster-recovery
Projects
No open projects
Archived in project
Development

No branches or pull requests

2 participants