-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
spanconfig: handle mismatched desc types post-restore #80397
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Fixes cockroachdb#75831, an annoying bug in the intersection between the span configs infrastructure + backup/restore. It's possible to observe mismatched descriptor types for the same ID post-RESTORE, an invariant the span configs infrastructure relies on. This paper simply papers over this mismatch, kicking off a full reconciliation process to recover if it occurs. Doing something "better" is a lot more invasive, the options being: - pausing the reconciliation job during restore (prototyped in cockroachdb#80339); - observing a reconciler checkpoint in the restore job (work since we would have flushed out RESTORE's descriptor deletions and separately handle the RESTORE's descriptor additions -- them having different types would not fire the assertion); - re-keying restored descriptors to not re-use the same IDs as existing schema objects. While here, we add a bit of plumbing/testing to make the future work/testing for \cockroachdb#73694 (using reconciler checkpoints on retries) easier. This PR also sets the stage for the following pattern around use of checkpoints: 1. We'll use checkpoints and incrementally reconciler during job-internal retries (added in cockroachdb#78117); 2. We'll always fully reconcile (i.e. ignore checkpoints) when the job itself is bounced around. We do this because we need to fully reconcile across job restarts if the reason for the restart is due to RESTORE-induced errors. This is a bit unfortunate, and if we want to improve on (2), we'd have to persist job state (think "poison pill") that ensures that we ignore the persisted checkpoint. As of this PR, the only use of job-persisted checkpoints are the migrations rolling out this infrastructure. That said, now we'll have a mechanism to force a full reconciliation attempt -- we can: -- get $job_id SELECT job_id FROM [SHOW AUTOMATIC JOBS] WHERE job_type = 'AUTO SPAN CONFIG RECONCILIATION' PAUSE JOB $job_id RESUME JOB $job_id Release note: None
irfansharif
force-pushed
the
220422.retry-combine
branch
from
April 22, 2022 18:54
f2e4f7f
to
1e34408
Compare
ajwerner
approved these changes
Apr 26, 2022
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 10 of 10 files at r1, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @adityamaru and @arulajmani)
bors r+ |
Build failed (retrying...): |
Build failed: |
bors r+ |
Build succeeded: |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #75831, an annoying bug in the intersection between the span
configs infrastructure + backup/restore.
It's possible to observe mismatched descriptor types for the same ID
post-RESTORE, an invariant the span configs infrastructure relies on.
This paper simply papers over this mismatch, kicking off a full
reconciliation process to recover if it occurs. Doing something "better"
is a lot more invasive, the options being:
would have flushed out RESTORE's descriptor deletions and separately
handle the RESTORE's descriptor additions -- them having different
types would not fire the assertion);
schema objects.
While here, we add a bit of plumbing/testing to make the future
work/testing for #73694 (using reconciler checkpoints on retries)
easier. This PR also sets the stage for the following pattern around use
of checkpoints:
retries (added in spanconfig/job: improve retry behaviour under failures #78117);
itself is bounced around.
We do this because we need to fully reconcile across job restarts if the
reason for the restart is due to RESTORE-induced errors. This is a bit
unfortunate, and if we want to improve on (2), we'd have to persist job
state (think "poison pill") that ensures that we ignore the persisted
checkpoint. As of this PR, the only use of job-persisted checkpoints are
the migrations rolling out this infrastructure. That said, now we'll
have a mechanism to force a full reconciliation attempt -- we can:
Release note: None