-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cli: implement CLI flags for decommission pre-check #91893
Labels
A-kv-decom-rolling-restart
Decommission and Rolling Restarts
A-kv-distribution
Relating to rebalancing and leasing.
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
T-kv
KV Team
Comments
AlexTalks
added
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
A-kv-distribution
Relating to rebalancing and leasing.
A-kv-decom-rolling-restart
Decommission and Rolling Restarts
T-kv
KV Team
labels
Nov 15, 2022
AlexTalks
added a commit
to AlexTalks/cockroach
that referenced
this issue
Jan 27, 2023
WIP Fixes: cockroachdb#91893 Release note (cli change): TODO
AlexTalks
added a commit
to AlexTalks/cockroach
that referenced
this issue
Jan 27, 2023
This changes the functionality of `cockroach node decommission` to run preliminary readiness checks prior to starting the decommission of the nodes. These checks, if they evaluate and find that nodes are not ready for decommission, will report the errors observed and on which nodes so that the cluster's configuration can be rectified prior to reattempting node decommission. The readiness checks are enabled by default, but be controlled with the following new flags: ``` --dry-run Only evaluate decommission readiness and check decommission status, without actually decommissioning the node. --checks string Specifies how to evaluate readiness checks prior to node decommission. Takes any of the following values: - enabled evaluate readiness prior to starting node decommission. - strict use strict readiness evaluation mode prior to node decommission. - skip skip readiness checks and immediately request node decommission. ``` Fixes: cockroachdb#91893 Release note (cli change): TODO
AlexTalks
added a commit
to AlexTalks/cockroach
that referenced
this issue
Feb 16, 2023
This changes the functionality of `cockroach node decommission` to run preliminary readiness checks prior to starting the decommission of the nodes. These checks, if they evaluate and find that nodes are not ready for decommission, will report the errors observed and on which nodes so that the cluster's configuration can be rectified prior to reattempting node decommission. The readiness checks are enabled by default, but be controlled with the following new flags: ``` --dry-run Only evaluate decommission readiness and check decommission status, without actually decommissioning the node. --checks string Specifies how to evaluate readiness checks prior to node decommission. Takes any of the following values: - enabled evaluate readiness prior to starting node decommission. - strict use strict readiness evaluation mode prior to node decommission. - skip skip readiness checks and immediately request node decommission. ``` Fixes: cockroachdb#91893 Release note (cli change): TODO
craig bot
pushed a commit
that referenced
this issue
Mar 3, 2023
96100: cli: evaluate readiness prior to node decommission r=kvoli a=AlexTalks This changes the functionality of `cockroach node decommission` to run preliminary readiness checks prior to starting the decommission of the nodes. These checks, if they evaluate and find that nodes are not ready for decommission, will report the errors observed and on which nodes so that the cluster's configuration can be rectified prior to reattempting node decommission. The readiness checks are enabled by default, but can be controlled with the following new flags: ``` --dry-run Only evaluate decommission readiness and check decommission status, without actually decommissioning the node. --checks string Specifies how to evaluate readiness checks prior to node decommission. Takes any of the following values: - enabled evaluate readiness prior to starting node decommission. - strict use strict readiness evaluation mode prior to node decommission. - skip skip readiness checks and immediately request node decommission. ``` Issues blocking decommission are presented grouped by node and error, e.g. ``` $ ./cockroach node decommission 1 4 5 --insecure id | is_live | replicas | is_decommissioning | membership | is_draining | readiness | blocking_ranges -----+---------+----------+--------------------+------------+-------------+-------------------+------------------ 1 | true | 53 | false | active | false | allocation errors | 47 4 | true | 52 | false | active | false | allocation errors | 46 5 | true | 54 | false | active | false | allocation errors | 48 (3 rows) ranges blocking decommission detected n1 has 34 replicas blocked with error: "0 of 1 live stores are able to take a new replica for the range (2 already have a voter, 0 already have a non-voter); likely not enough nodes in cluster" n1 has 13 replicas blocked with error: "0 of 1 live stores are able to take a new replica for the range (2 already have a voter, 0 already have a non-voter); replicas must match constraints [{+node1:1} {+node4:1} {+node5:1}]; voting replicas must match voter_constraints []" n4 has 13 replicas blocked with error: "0 of 1 live stores are able to take a new replica for the range (2 already have a voter, 0 already have a non-voter); replicas must match constraints [{+node1:1} {+node4:1} {+node5:1}]; voting replicas must match voter_constraints []" n4 has 33 replicas blocked with error: "0 of 1 live stores are able to take a new replica for the range (2 already have a voter, 0 already have a non-voter); likely not enough nodes in cluster" n5 has 35 replicas blocked with error: "0 of 1 live stores are able to take a new replica for the range (2 already have a voter, 0 already have a non-voter); likely not enough nodes in cluster" ...more blocking errors detected. ERROR: Cannot decommission nodes. Failed running "node decommission" ``` Fixes: #91893 Release note (cli change): `cockroach node decommission` operations now preliminarily check the ability of the node to complete decommissioning, given the cluster configuration and the ranges with replicas present on the node. This step can be skipped by using the flag `--checks=skip`. When errors are detected that would result in the inability to complete node decommission, they will be printed to stderr and the command will exit, instead of marking the node as `decommissioning` and beginning the node decommission process. When the strict readiness evaluation mode is used by setting the flag `--checks=strict`, any ranges that need any preliminary actions prior to replacement for the decommission process (e.g. ranges that are not yet fully upreplicated) will block the decommission process. 97956: bazel: rebuild db-console when relevant files change r=rickystewart a=sjbarag Previously, Bazel would ignore changes to files in ./pkg/ui/workspaces/db-console/src (or '.../assets/', '.../fonts', or '.../styl') after an initial build completes and is cached. Since only those directories were listed and not the files within them via a `glob`, Bazel simply never looked for changes in those files. In fact, a bazel query confirms this: bazel query 'somepath(//pkg/cmd/cockroach:cockroach, //pkg/ui/workspaces/db-console:src/index.tsx)' INFO: empty results Use a `glob` to force Bazel to depend on individual source files in db-console. That same query should now result in a valid dependency path: bazel query 'somepath(//pkg/cmd/cockroach:cockroach, //pkg/ui/workspaces/db-console:src/index.tsx)' //pkg/cmd/cockroach:cockroach //pkg/cmd/cockroach:cockroach_lib //pkg/ui/distccl:distccl //pkg/ui/distccl:genassets //pkg/ui/workspaces/db-console:db-console-ccl //pkg/ui/workspaces/db-console:src/index.tsx Fixes: #97954 Release note (build change): Changes to source files in pkg/ui/workspaces/db-console now properly bust the build cache, and are consistently included in local builds. Co-authored-by: Alex Sarkesian <[email protected]> Co-authored-by: Sean Barag <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
A-kv-decom-rolling-restart
Decommission and Rolling Restarts
A-kv-distribution
Relating to rebalancing and leasing.
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
T-kv
KV Team
As part of #90752, we need to add CLI flags to the
cockroach node decommission
command to run the decommission pre-check validation. These should include commands to run the checks on their own (I.e.--checks-only
) as well as to skip the checks (i.e.--no-checks
). These flags should be incorporated into the client call for the decommission pre-check API implemented in #91568.Jira issue: CRDB-21472
The text was updated successfully, but these errors were encountered: