-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
release-22.1: server: react to decommissioning nodes by proactively enqueuing their replicas #82680
release-22.1: server: react to decommissioning nodes by proactively enqueuing their replicas #82680
Conversation
Release note: None
… replicas Note: This patch implements a subset of cockroachdb#80836 Previously, when a node was marked `DECOMMISSIONING`, other nodes in the system would learn about it via gossip but wouldn't do much in the way of reacting to it. They'd rely on their `replicaScanner` to gradually run into the decommissioning node's ranges and rely on their `replicateQueue` to then rebalance them. This meant that even when decommissioning a mostly empty node, our worst case lower bound for marking that node fully decommissioned was _one full scanner interval_ (which is 10 minutes by default). This patch improves this behavior by installing an idempotent callback that is invoked every time a node is detected to be `DECOMMISSIONING`. When it is run, the callback enqueues all the replicas on the local stores that are on ranges that also have replicas on the decommissioning node. Release note (performance improvement): Decommissioning should now be substantially faster, particularly for small to moderately loaded nodes.
Thanks for opening a backport. Please check the backport criteria before merging:
If some of the basic criteria cannot be satisfied, ensure that the exceptional criteria are satisfied within.
Add a brief release justification to the body of your PR to justify this backport. Some other things to consider:
|
Don't stamp yet, there's a bug in the original patch that I need to fix first. |
This commit fixes a bug from cockroachdb#80993. Without this commit, nodes might re-run the callback to enqueue a decommissioning node's ranges into their replicate queues if they received a gossip update from that decommissioning node that was perceived to be newer. Re-running this callback on every newer gossip update from a decommissioning node will be too expensive for nodes with a lot of replicas. Release note: None
df69c6a
to
66570f2
Compare
@kvoli and / or @AlexTalks: could I get a stamp on this? This patch has been baking on master for over a week and we haven't seen any fallout related to it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This patch fixes a merge skew introduced by cockroachdb#82680 and 82800 Release note: None
Backport 2/2 commits from #80993 and 1/1 commit from #82683.
/cc @cockroachdb/release
Note: This patch implements a subset of #80836
Previously, when a node was marked
DECOMMISSIONING
, other nodes in thesystem would learn about it via gossip but wouldn't do much in the way
of reacting to it. They'd rely on their
replicaScanner
to graduallyrun into the decommissioning node's ranges and rely on their
replicateQueue
to then rebalance them.This meant that even when decommissioning a mostly empty node, our worst
case lower bound for marking that node fully decommissioned was one
full scanner interval (which is 10 minutes by default).
This patch improves this behavior by installing an idempotent callback
that is invoked every time a node is detected to be
DECOMMISSIONING
.When it is run, the callback enqueues all the replicas on the local
stores that are on ranges that also have replicas on the decommissioning
node. Note that when nodes in the system restart, they'll re-invoke this callback
for any already
DECOMMISSIONING
node.Resolves #79453
Release note (performance improvement): Decommissioning should now be
substantially faster, particularly for small to moderately loaded nodes.
Release justification: non-invasive performance improvement for node decommissioning