-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
server: introduce a decommission monitor task #80695
Closed
aayushshah15
wants to merge
2
commits into
cockroachdb:master
from
aayushshah15:20220427_proactiveDecommission
Closed
server: introduce a decommission monitor task #80695
aayushshah15
wants to merge
2
commits into
cockroachdb:master
from
aayushshah15:20220427_proactiveDecommission
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
aayushshah15
force-pushed
the
20220427_proactiveDecommission
branch
from
April 28, 2022 08:38
b77e9ea
to
5d2d33b
Compare
Release note: None
This commit introduces a `decommissionMonitor` that is responsible for a few key things: - When a node begins decommissioning, its `decommissionMonitor` is spun up, which proactively tells other nodes in the system to enqueue its ranges into the `replicateQueue`. This means that a decommission process no longer has the worst case lower bound of 10 mins (i.e. the default replica scanner interval). - In a future patch, this `decommissionMonitor` will selectively nudge some of its straggling replicas' ranges to be re-enqueued into their leaseholder store's replicateQueues. This will be done with the intention of collecting and persisting (or dumping, initially) traces from these enqueue operations. This should help reduce our time to RCA a slow decommission process. Comparing the time taken to decommission a node out of a 6 node cluster containing a 200-warehouse TPC-C dataset: With this patch: ``` time rp ssh aayushs-test:1 -- "./cockroach node decommission --insecure --self"; ... roachprod ssh aayushs-test:1 -- 0.07s user 0.07s system 0% cpu 59.125 total ``` Without this patch: ``` time rp ssh aayushs-test-wo:1 -- "./cockroach node decommission --insecure --self"; ... roachprod ssh aayushs-test-wo:1 -- 0.17s user 0.15s system 0% cpu 3:47.34 total ``` Release note: None
aayushshah15
force-pushed
the
20220427_proactiveDecommission
branch
from
April 28, 2022 16:35
5d2d33b
to
5a7969f
Compare
aayushshah15
added a commit
to aayushshah15/cockroach
that referenced
this pull request
May 2, 2022
Note: This PR is an alternative to, but subsumes, cockroachdb#80695. Previously, when a node was marked `DECOMMISSIONING`, other nodes in the system would learn about it via gossip but wouldn't do much in the way of reacting to it. They'd rely on their `replicaScanner` to gradually run into the decommissioning node's ranges and rely on their `replicateQueue` to then rebalance them. This had a few issues: 1. It meant that even when decommissioning a mostly empty node, our worst case lower bound for marking that node fully decommissioned was _one full scanner interval_ (which is 10 minutes by default). 2. If the replicateQueue ran into an error while rebalancing a decommissioning replica (see cockroachdb#79266 for instance), it would only retry that replica after either one full scanner interval or after the purgatory interval. This patch improves this behavior by installing an idempotent callback that is invoked every time a node is detected to be `DECOMMISSIONING`. This callback spins up an async task that will first proactively enqueue all of the decommissioning nodes ranges (that have a replica on the local node) into the local node's replicateQueues. Then, this task will periodically nudge the decommissioning node's straggling replicas in order to requeue them (to alleviate (2) from above). All this is managed by a lightweight `decommissionMonitor`, which is responsible for managing the lifecycle of these async tasks. Release note: None
aayushshah15
added a commit
to aayushshah15/cockroach
that referenced
this pull request
May 3, 2022
Note: This PR is an alternative to, but subsumes, cockroachdb#80695. Previously, when a node was marked `DECOMMISSIONING`, other nodes in the system would learn about it via gossip but wouldn't do much in the way of reacting to it. They'd rely on their `replicaScanner` to gradually run into the decommissioning node's ranges and rely on their `replicateQueue` to then rebalance them. This had a few issues: 1. It meant that even when decommissioning a mostly empty node, our worst case lower bound for marking that node fully decommissioned was _one full scanner interval_ (which is 10 minutes by default). 2. If the replicateQueue ran into an error while rebalancing a decommissioning replica (see cockroachdb#79266 for instance), it would only retry that replica after either one full scanner interval or after the purgatory interval. This patch improves this behavior by installing an idempotent callback that is invoked every time a node is detected to be `DECOMMISSIONING`. This callback spins up an async task that will first proactively enqueue all of the decommissioning nodes ranges (that have a replica on the local node) into the local node's replicateQueues. Then, this task will periodically nudge the decommissioning node's straggling replicas in order to requeue them (to alleviate (2) from above). All this is managed by a lightweight `decommissionMonitor`, which is responsible for managing the lifecycle of these async tasks. Release note: None
Closing in favor of #80993 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
NOTE: This is a WIP. Don't review / look yet.
This commit introduces a
decommissionMonitor
that is responsible for a fewkey things:
When a node begins decommissioning, its
decommissionMonitor
is spun up,which proactively tells other nodes in the system to enqueue its ranges into
the
replicateQueue
. This means that a decommission process no longer has theworst case lower bound of 10 mins (i.e. the default replica scanner interval).
In a future patch, this
decommissionMonitor
will selectively nudge some ofits straggling replicas' ranges to be re-enqueued into their leaseholder
store's replicateQueues. This will be done with the intention of collecting and
persisting (or dumping, initially) traces from these enqueue operations. This
should help reduce our time to RCA a slow decommission process.
Comparing the time taken to decommission a node out of a 6 node cluster
containing a 200-warehouse TPC-C dataset:
With this patch:
Without this patch:
Resolves #79453
Release note: None