kvserver: proactively enqueue replicas for a decommissioning node #79453
Labels
A-kv-distribution
Relating to rebalancing and leasing.
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
O-postmortem
Originated from a Postmortem action item.
T-kv
KV Team
Came up in a conversation with, and proposed by, @lidorcarmel.
Today, the
replicaScanner
on each store in the cluster keeps pacing through its replicas (such that it processes each replica once roughly every 10 mins). ThereplicaScanner
keeps optionally queuing these replicas into each of the store's queues.When a node is decommissioning, this status gets broadcast to other nodes in the cluster via gossip. So when the replicateQueue encounters a replica that is the leaseholder for a range that has a replica on a decommissioning node, it decides to take action to move that decommissioning replica away.
The issue here is that this discovery of decommissioning replicas is limited by the
replicaScanner
s 10 min scanning interval. This means that, generally, even the discovery of all replicas belonging to a decommissioning node will take ~10 mins. Furthermore, if there are any errors processing any of these decommissioning replicas, they will not be re-processed for another 10 mins.This issue proposes that we should actively enqueue all replicas belonging to a decommissioning node, into the
replicateQueue
s of all of a node's stores the moment it learns that a node's liveness record has changed fromLIVE
toDECOMMISSIONING
. Care will need to be taken to ensure that we're only enqueuing these replicas exactly once when the status of a node changes toDECOMMISSIONING
. Doing this should considerably cut down on how long it takes to decommission nodes in almost all scenarios, and, anecdotally, it also seems like the behaviour that operators intuitively expect.cc @cockroachdb/kv-notifications
Jira issue: CRDB-14873
Epic: CRDB-14621
The text was updated successfully, but these errors were encountered: