Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

backupccl: backup correctly tries reading in from base directory if l… #77315

Merged
merged 1 commit into from
Mar 10, 2022

Conversation

DarrylWong
Copy link
Contributor

@DarrylWong DarrylWong commented Mar 3, 2022

…atest/checkpoint files aren't found

Before, we only tried reading from the base directory if we caught a ErrFileDoesNotExist error. However
this does not account for the potential error thrown when the progress/latest directories don't exist.
This changes it so we now correctly retry reading from the base directory.

We also put the latest directory inside of a metadata directory, in order to avoid any potential
conflicts with there being a latest file and latest directory in the same base directory.

Also wraps errors in findLatestFile and readLatestCheckpointFile for more clarity when both base and
latest/progress directories fail to read.

Fixes #77312

Epic CRDB-7586

Release justification: Low risk bug fix
Release note: none

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@DarrylWong
Copy link
Contributor Author

Didn't do too much testing since it was getting late, but basically I just went with the first easier solution (where we just always try reading from base directory on error) just to confirm that it was indeed the problem. Just running a few tests it seemed like it didn't break anything and the original issue seemed fixed. But i'll give it some more testing tomorrow and figure out what solution we want to go with.

@DarrylWong DarrylWong force-pushed the latest-checkpoint-bug branch 2 times, most recently from de639be to e8a56ac Compare March 4, 2022 01:19
@DarrylWong DarrylWong marked this pull request as ready for review March 4, 2022 01:22
@DarrylWong DarrylWong requested review from a team, dt and adityamaru and removed request for a team and dt March 4, 2022 01:22
@adityamaru adityamaru requested a review from dt March 8, 2022 04:12
Copy link
Contributor

@adityamaru adityamaru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this. Would it be possible to write a unit test that blocks the cluster from upgrading past the BackupDoesNotOverwriteLatestAndCheckpoint version, taking a backup, then allowing it to upgrade, and running an incremental on the same chain?

pkg/ccl/backupccl/backup_destination.go Outdated Show resolved Hide resolved
latestFile, err = exportStore.ReadFile(ctx, latestFileName)
if err != nil {
return nil, errors.Wrap(err, "Latest file could not be read in base or metadata directory")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: s/Latest/LATEST/g

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wasn't 100% sure what you meant by this, did you mean in place of "latest file" or the "metadata directory"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh just change Latest to LATEST since thats the sentinel file name we use.

pkg/ccl/backupccl/backup_job.go Outdated Show resolved Hide resolved
pkg/ccl/backupccl/backup_job.go Outdated Show resolved Hide resolved
pkg/ccl/backupccl/manifest_handling.go Show resolved Hide resolved
@adityamaru adityamaru self-requested a review March 8, 2022 05:51
@DarrylWong DarrylWong force-pushed the latest-checkpoint-bug branch 2 times, most recently from bbd7020 to 29ae66d Compare March 8, 2022 19:40
Copy link
Contributor

@adityamaru adityamaru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd still like to see the test we discussed earlier in this PR, but happy to do that as a fast follow.

@DarrylWong DarrylWong force-pushed the latest-checkpoint-bug branch from 29ae66d to 1bff87f Compare March 10, 2022 14:42
@DarrylWong DarrylWong requested a review from adityamaru March 10, 2022 14:43
…atest/checkpoint files aren't found

Before, we only tried reading from the base directory if we caught a ErrFileDoesNotExist error. However
this does not account for the potential error thrown when the progress/latest directories don't exist.
This changes it so we now correctly retry reading from the base directory.

We also put the latest directory inside of a metadata directory, in order to avoid any potential
conflicts with there being a latest file and latest directory in the same base directory.

Also wraps errors in findLatestFile and readLatestCheckpointFile for more clarity when both base and
latest/progress directories fail to read.

Fixes cockroachdb#77312

Release justification: Bug fix for release blocker
Release note: none
@DarrylWong DarrylWong force-pushed the latest-checkpoint-bug branch from 1bff87f to c9e9855 Compare March 10, 2022 16:08
@DarrylWong
Copy link
Contributor Author

bors r+

@craig
Copy link
Contributor

craig bot commented Mar 10, 2022

Build succeeded:

@craig craig bot merged commit 8d046fc into cockroachdb:master Mar 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

backupccl: directory does not exist error is not correctly caught when reading checkpoint/latest files
3 participants