Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cli: implement CLI flags for decommission pre-check #91893

Closed
AlexTalks opened this issue Nov 15, 2022 · 0 comments · Fixed by #96100
Closed

cli: implement CLI flags for decommission pre-check #91893

AlexTalks opened this issue Nov 15, 2022 · 0 comments · Fixed by #96100
Assignees
Labels
A-kv-decom-rolling-restart Decommission and Rolling Restarts A-kv-distribution Relating to rebalancing and leasing. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-kv KV Team

Comments

@AlexTalks
Copy link
Contributor

AlexTalks commented Nov 15, 2022

As part of #90752, we need to add CLI flags to the cockroach node decommission command to run the decommission pre-check validation. These should include commands to run the checks on their own (I.e. --checks-only) as well as to skip the checks (i.e. --no-checks). These flags should be incorporated into the client call for the decommission pre-check API implemented in #91568.

Jira issue: CRDB-21472

@AlexTalks AlexTalks added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-kv-distribution Relating to rebalancing and leasing. A-kv-decom-rolling-restart Decommission and Rolling Restarts T-kv KV Team labels Nov 15, 2022
@AlexTalks AlexTalks self-assigned this Nov 15, 2022
AlexTalks added a commit to AlexTalks/cockroach that referenced this issue Jan 27, 2023
WIP

Fixes: cockroachdb#91893

Release note (cli change): TODO
AlexTalks added a commit to AlexTalks/cockroach that referenced this issue Jan 27, 2023
This changes the functionality of `cockroach node decommission` to run
preliminary readiness checks prior to starting the decommission of the
nodes. These checks, if they evaluate and find that nodes are not ready
for decommission, will report the errors observed and on which nodes so
that the cluster's configuration can be rectified prior to reattempting
node decommission.  The readiness checks are enabled by default, but be
controlled with the following new flags:
```
--dry-run               Only evaluate decommission readiness and check decommission status, without
                        actually decommissioning the node.

--checks string         Specifies how to evaluate readiness checks prior to node decommission. Takes
                        any of the following values:
                            - enabled  evaluate readiness prior to starting node decommission.
                            - strict   use strict readiness evaluation mode prior to node decommission.
                            - skip     skip readiness checks and immediately request node decommission.
```

Fixes: cockroachdb#91893

Release note (cli change): TODO
AlexTalks added a commit to AlexTalks/cockroach that referenced this issue Feb 16, 2023
This changes the functionality of `cockroach node decommission` to run
preliminary readiness checks prior to starting the decommission of the
nodes. These checks, if they evaluate and find that nodes are not ready
for decommission, will report the errors observed and on which nodes so
that the cluster's configuration can be rectified prior to reattempting
node decommission.  The readiness checks are enabled by default, but be
controlled with the following new flags:
```
--dry-run               Only evaluate decommission readiness and check decommission status, without
                        actually decommissioning the node.

--checks string         Specifies how to evaluate readiness checks prior to node decommission. Takes
                        any of the following values:
                            - enabled  evaluate readiness prior to starting node decommission.
                            - strict   use strict readiness evaluation mode prior to node decommission.
                            - skip     skip readiness checks and immediately request node decommission.
```

Fixes: cockroachdb#91893

Release note (cli change): TODO
craig bot pushed a commit that referenced this issue Mar 3, 2023
96100: cli: evaluate readiness prior to node decommission r=kvoli a=AlexTalks

This changes the functionality of `cockroach node decommission` to run
preliminary readiness checks prior to starting the decommission of the
nodes. These checks, if they evaluate and find that nodes are not ready
for decommission, will report the errors observed and on which nodes so
that the cluster's configuration can be rectified prior to reattempting
node decommission.  The readiness checks are enabled by default, but can
be controlled with the following new flags:
```
--dry-run               Only evaluate decommission readiness and check decommission status, without
                        actually decommissioning the node.

--checks string         Specifies how to evaluate readiness checks prior to node decommission. Takes
                        any of the following values:
                            - enabled  evaluate readiness prior to starting node decommission.
                            - strict   use strict readiness evaluation mode prior to node decommission.
                            - skip     skip readiness checks and immediately request node decommission.
```

Issues blocking decommission are presented grouped by node and error, e.g.
```
$ ./cockroach node decommission 1 4 5 --insecure

  id | is_live | replicas | is_decommissioning | membership | is_draining |     readiness     | blocking_ranges
-----+---------+----------+--------------------+------------+-------------+-------------------+------------------
   1 |  true   |       53 |       false        |   active   |    false    | allocation errors |              47
   4 |  true   |       52 |       false        |   active   |    false    | allocation errors |              46
   5 |  true   |       54 |       false        |   active   |    false    | allocation errors |              48
(3 rows)

ranges blocking decommission detected

n1 has 34 replicas blocked with error: "0 of 1 live stores are able to take a new replica for the range (2 already have a voter, 0 already have a non-voter); likely not enough nodes in cluster"
n1 has 13 replicas blocked with error: "0 of 1 live stores are able to take a new replica for the range (2 already have a voter, 0 already have a non-voter); replicas must match constraints [{+node1:1} {+node4:1} {+node5:1}]; voting replicas must match voter_constraints []"
n4 has 13 replicas blocked with error: "0 of 1 live stores are able to take a new replica for the range (2 already have a voter, 0 already have a non-voter); replicas must match constraints [{+node1:1} {+node4:1} {+node5:1}]; voting replicas must match voter_constraints []"
n4 has 33 replicas blocked with error: "0 of 1 live stores are able to take a new replica for the range (2 already have a voter, 0 already have a non-voter); likely not enough nodes in cluster"
n5 has 35 replicas blocked with error: "0 of 1 live stores are able to take a new replica for the range (2 already have a voter, 0 already have a non-voter); likely not enough nodes in cluster"
...more blocking errors detected.

ERROR: Cannot decommission nodes.
Failed running "node decommission"
```

Fixes: #91893

Release note (cli change): `cockroach node decommission` operations now
preliminarily check the ability of the node to complete decommissioning,
given the cluster configuration and the ranges with replicas present
on the node. This step can be skipped by using the flag `--checks=skip`.
When errors are detected that would result in the inability to complete
node decommission, they will be printed to stderr and the command will
exit, instead of marking the node as `decommissioning` and beginning the
node decommission process.  When the strict readiness evaluation mode
is used by setting the flag `--checks=strict`, any ranges that need any
preliminary actions prior to replacement for the decommission process
(e.g. ranges that are not yet fully upreplicated) will block the
decommission process.

97956: bazel: rebuild db-console when relevant files change r=rickystewart a=sjbarag

Previously, Bazel would ignore changes to files in ./pkg/ui/workspaces/db-console/src (or '.../assets/', '.../fonts', or '.../styl') after an initial build completes and is cached. Since only those directories were listed and not the files within them via a `glob`, Bazel simply never looked for changes in those files. In fact, a bazel query confirms this:

    bazel query 'somepath(//pkg/cmd/cockroach:cockroach, //pkg/ui/workspaces/db-console:src/index.tsx)'
    INFO: empty results

Use a `glob` to force Bazel to depend on individual source files in db-console. That same query should now result in a valid dependency path:

    bazel query 'somepath(//pkg/cmd/cockroach:cockroach, //pkg/ui/workspaces/db-console:src/index.tsx)'
    //pkg/cmd/cockroach:cockroach
    //pkg/cmd/cockroach:cockroach_lib
    //pkg/ui/distccl:distccl
    //pkg/ui/distccl:genassets
    //pkg/ui/workspaces/db-console:db-console-ccl
    //pkg/ui/workspaces/db-console:src/index.tsx

Fixes: #97954

Release note (build change): Changes to source files in pkg/ui/workspaces/db-console now properly bust the build cache, and are consistently included in local builds.

Co-authored-by: Alex Sarkesian <[email protected]>
Co-authored-by: Sean Barag <[email protected]>
@craig craig bot closed this as completed in cefa614 Mar 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-decom-rolling-restart Decommission and Rolling Restarts A-kv-distribution Relating to rebalancing and leasing. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-kv KV Team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant