-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvserver: use Background() in computeChecksumPostApply goroutine #75448
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
On the leaseholder, `ctx` passed to `computeChecksumPostApply` is that of the proposal. As of cockroachdb#71806, this context is canceled right after the corresponding proposal is signaled (and the client goroutine returns from `sendWithRangeID`). This effectively prevents most consistency checks from succeeding (they previously were not affected by higher-level cancellation because the consistency check is triggered from a local queue that talks directly to the replica, i.e. had something like a minutes-long timeout). This caused disastrous behavior in the `clearrange` suite of roachtests. That test imports a large table. After the import, most ranges have estimates (due to the ctx cancellation preventing the consistency checks, which as a byproduct trigger stats adjustments) and their stats claim that they are very small. Before recent PR cockroachdb#74674, `ClearRange` on such ranges would use individual point deletions instead of the much more efficient pebble range deletions, effectively writing a lot of data and running the nodes out of disk. Failures of `clearrange` with cockroachdb#74674 were also observed, but they did not involve out-of-disk situations, so are possibly an alternative failure mode (that may still be related to the newly introduced presence of context cancellation). Touches cockroachdb#68303. Release note: None
erikgrinaker
approved these changes
Jan 24, 2022
bors r=erikgrinaker TFTR! |
Build failed (retrying...): |
Build failed: |
bors r=erikgrinaker |
Build succeeded: |
tbg
added a commit
to tbg/cockroach
that referenced
this pull request
Feb 22, 2022
We've seen in the events leading up to cockroachdb#75448 that a build-up of consistency check computations on a node can severely impact node performance. This commit attempts to address the main source of that, while re-working the code for easier maintainability. The way the consistency checker works is by replicating a command through Raft that, on each Replica, triggers an async checksum computations the results of which the caller collects via `CollectChecksum` requests addressed to each `Replica`. If for any reason, the caller does *not* wait to collect the checksums but instead moves on to run another consistency check (perhaps on another Range), these inflight computations can build up over time. This was the main issue in cockroachdb#75448; we were accidentally canceling the context on the leaseholder "right away", failing the consistency check (but leaving it running on all other replicas), and moving on to the next Range. As a result, some (but with spread out leaseholders, ultimately all) Replicas ended up with dozens of consistency check computations, starving I/O and CPU. We "addressed" this by avoiding this errant ctx cancellation (cockroachdb#75448 but longer-term cockroachdb#75656), but this isn't a holistic fix yet. In this commit, we make three main changes: - give the inflight consistency check computations a clean API, which makes it much easier to understand "how it works". - when returning from CollectChecksum (either on success or error, notably including context cancellation), cancel the corresponding consistency check. This solves the problem, *assuming* that CollectChecksum is reliably issued to each Replica. - reliably issue CollectChecksum to each Replica on which a computation may have been triggered. When the caller's context is canceled, still do the call with a one-second-timeout one-off Context which should be good enough to make it to the Replica and short-circuit the call. Release note: None
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
On the leaseholder,
ctx
passed tocomputeChecksumPostApply
is thatof the proposal. As of #71806, this context is canceled right after the
corresponding proposal is signaled (and the client goroutine returns
from
sendWithRangeID
). This effectively prevents most consistencychecks from succeeding (they previously were not affected by
higher-level cancellation because the consistency check is triggered
from a local queue that talks directly to the replica, i.e. had
something like a minutes-long timeout).
This caused disastrous behavior in the
clearrange
suite of roachtests.That test imports a large table. After the import, most ranges have
estimates (due to the ctx cancellation preventing the consistency
checks, which as a byproduct trigger stats adjustments) and their stats
claim that they are very small. Before recent PR #74674,
ClearRange
onsuch ranges would use individual point deletions instead of the much
more efficient pebble range deletions, effectively writing a lot of data
and running the nodes out of disk.
Failures of
clearrange
with #74674 were also observed, but they didnot involve out-of-disk situations, so are possibly an alternative
failure mode (that may still be related to the newly introduced presence
of context cancellation).
Touches #68303.
Release note: None