Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: proactively enqueue replicas for a decommissioning node #79453

Closed
aayushshah15 opened this issue Apr 5, 2022 · 2 comments · Fixed by #80993
Closed

kvserver: proactively enqueue replicas for a decommissioning node #79453

aayushshah15 opened this issue Apr 5, 2022 · 2 comments · Fixed by #80993
Assignees
Labels
A-kv-distribution Relating to rebalancing and leasing. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) O-postmortem Originated from a Postmortem action item. T-kv KV Team

Comments

@aayushshah15
Copy link
Contributor

aayushshah15 commented Apr 5, 2022

Came up in a conversation with, and proposed by, @lidorcarmel.

Today, the replicaScanner on each store in the cluster keeps pacing through its replicas (such that it processes each replica once roughly every 10 mins). The replicaScanner keeps optionally queuing these replicas into each of the store's queues.

When a node is decommissioning, this status gets broadcast to other nodes in the cluster via gossip. So when the replicateQueue encounters a replica that is the leaseholder for a range that has a replica on a decommissioning node, it decides to take action to move that decommissioning replica away.

The issue here is that this discovery of decommissioning replicas is limited by the replicaScanners 10 min scanning interval. This means that, generally, even the discovery of all replicas belonging to a decommissioning node will take ~10 mins. Furthermore, if there are any errors processing any of these decommissioning replicas, they will not be re-processed for another 10 mins.

This issue proposes that we should actively enqueue all replicas belonging to a decommissioning node, into the replicateQueues of all of a node's stores the moment it learns that a node's liveness record has changed from LIVE to DECOMMISSIONING. Care will need to be taken to ensure that we're only enqueuing these replicas exactly once when the status of a node changes to DECOMMISSIONING. Doing this should considerably cut down on how long it takes to decommission nodes in almost all scenarios, and, anecdotally, it also seems like the behaviour that operators intuitively expect.

cc @cockroachdb/kv-notifications

Jira issue: CRDB-14873

Epic: CRDB-14621

@aayushshah15 aayushshah15 added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-kv-distribution Relating to rebalancing and leasing. A-kv-decom-rolling-restart Decommission and Rolling Restarts labels Apr 5, 2022
@blathers-crl blathers-crl bot added the T-kv KV Team label Apr 5, 2022
@erikgrinaker
Copy link
Contributor

Related to #79318.

@kenliu-crl
Copy link
Contributor

manually reviewed and brought up to date

@exalate-issue-sync exalate-issue-sync bot removed the A-kv-decom-rolling-restart Decommission and Rolling Restarts label May 27, 2022
@mwang1026 mwang1026 added the O-postmortem Originated from a Postmortem action item. label May 27, 2022
@craig craig bot closed this as completed in e9456ba Jun 8, 2022
craig bot pushed a commit that referenced this issue Jun 12, 2022
81005: kvserver: retry failures to rebalance decommissioning replicas r=aayushshah15 a=aayushshah15

Related to #80993
Relates to #79453


This commit makes it such that failures to rebalance replicas on
decommissioning nodes no longer move the replica out of the
replicateQueue as they previously used to. Instead, these failures now
put these replicas into the replicateQueue's purgatory, which will retry
these replicas every minute.

All this is intended to improve the speed of decommissioning towards
its tail end, since previously, failures to rebalance these replicas
meant that they were only retried after about 10 minutes.

Release note: None


Co-authored-by: Aayush Shah <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-distribution Relating to rebalancing and leasing. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) O-postmortem Originated from a Postmortem action item. T-kv KV Team
Projects
None yet
5 participants