-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kv: method to check decommission by valid replica replacement availability #91571
Labels
A-kv-decom-rolling-restart
Decommission and Rolling Restarts
A-kv-distribution
Relating to rebalancing and leasing.
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
T-kv
KV Team
Comments
AlexTalks
added
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
A-kv-distribution
Relating to rebalancing and leasing.
A-kv-decom-rolling-restart
Decommission and Rolling Restarts
labels
Nov 9, 2022
AlexTalks
added a commit
to AlexTalks/cockroach
that referenced
this issue
Dec 19, 2022
This adds support for the evaluation of the decommission readiness of a node (or set of nodes), by simulating their liveness to have the DECOMMISSIONING status and utilizing the allocator to ensure that we are able to perform any actions needed to repair the range. This supports a "strict" mode, in which case we expect all ranges to only need replacement or removal due to the decommissioning status, or a more permissive "non-strict" mode, which allows for other actions needed, as long as they do not encounter errors in finding a suitable allocation target. The non-strict mode allows us to permit situations where a range may have more than one action needed to repair it, such as a range that needs to reach its replication factor before the decommissioning replica can be replaced, or a range that needs to finalize an atomic replication change. Depends on cockroachdb#92367. Part of cockroachdb#91571 Release note: None
AlexTalks
added a commit
to AlexTalks/cockroach
that referenced
this issue
Dec 22, 2022
This change refactors parts of the replicate queue's `PlanOneChange(..)` and `addOrRemove{Non}Voters(..)` functions to reusable helper functions that simplify usage of the allocator and deduplicate repeated code paths. The change also adds convenience methods to the `AllocatorAction` enum, to move certain determinations (such as if a computed allocator action is a remove or a replace) closer to the allocator type it is based on. These changes move more of the logic needed to use the allocator into the `allocatorimpl` package itself, enabling usage of the allocator outside of the replicate queue. Part of cockroachdb#91571. Release note: None
AlexTalks
added a commit
to AlexTalks/cockroach
that referenced
this issue
Jan 5, 2023
This change refactors parts of the replicate queue's `PlanOneChange(..)` and `addOrRemove{Non}Voters(..)` functions to reusable helper functions that simplify usage of the allocator and deduplicate repeated code paths. The change also adds convenience methods to the `AllocatorAction` enum, to move certain determinations (such as if a computed allocator action is a remove or a replace) closer to the allocator type it is based on. These changes move more of the logic needed to use the allocator into the `allocatorimpl` package itself, enabling usage of the allocator outside of the replicate queue. Part of cockroachdb#91571. Release note: None
AlexTalks
added a commit
to AlexTalks/cockroach
that referenced
this issue
Jan 7, 2023
This change refactors parts of the replicate queue's `PlanOneChange(..)` and `addOrRemove{Non}Voters(..)` functions to reusable helper functions that simplify usage of the allocator and deduplicate repeated code paths. The change also adds convenience methods to the `AllocatorAction` enum, to move certain determinations (such as if a computed allocator action is a remove or a replace) closer to the allocator type it is based on. These changes move more of the logic needed to use the allocator into the `allocatorimpl` package itself, enabling usage of the allocator outside of the replicate queue. Part of cockroachdb#91571. Release note: None
AlexTalks
added a commit
to AlexTalks/cockroach
that referenced
this issue
Jan 7, 2023
This adds support for the evaluation of the decommission readiness of a node (or set of nodes), by simulating their liveness to have the DECOMMISSIONING status and utilizing the allocator to ensure that we are able to perform any actions needed to repair the range. This supports a "strict" mode, in which case we expect all ranges to only need replacement or removal due to the decommissioning status, or a more permissive "non-strict" mode, which allows for other actions needed, as long as they do not encounter errors in finding a suitable allocation target. The non-strict mode allows us to permit situations where a range may have more than one action needed to repair it, such as a range that needs to reach its replication factor before the decommissioning replica can be replaced, or a range that needs to finalize an atomic replication change. Depends on cockroachdb#94024. Part of cockroachdb#91571 Release note: None
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
A-kv-decom-rolling-restart
Decommission and Rolling Restarts
A-kv-distribution
Relating to rebalancing and leasing.
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
T-kv
KV Team
As part of #90752, we will need to determine the viability of node decommission by evaluating the possibility of replacing each replica on the decommissioning node with a new replica on a valid, available store. In evaluating each replica that exists on the node(s), we can also gather errors and potential remediation steps so that the decommission may become viable.
Jira issue: CRDB-21322
The text was updated successfully, but these errors were encountered: