Skip to content

Commit

Permalink
cli: evaluate readiness prior to node decommission
Browse files Browse the repository at this point in the history
This changes the functionality of `cockroach node decommission` to run
preliminary readiness checks prior to starting the decommission of the
nodes. These checks, if they evaluate and find that nodes are not ready
for decommission, will report the errors observed and on which nodes so
that the cluster's configuration can be rectified prior to reattempting
node decommission.  The readiness checks are enabled by default, but can
be controlled with the following new flags:
```
--dry-run               Only evaluate decommission readiness and check decommission status, without
                        actually decommissioning the node.

--checks string         Specifies how to evaluate readiness checks prior to node decommission. Takes
                        any of the following values:
                            - enabled  evaluate readiness prior to starting node decommission.
                            - strict   use strict readiness evaluation mode prior to node decommission.
                            - skip     skip readiness checks and immediately request node decommission.
```

Issues blocking decommission are presented grouped by node and error,
e.g.
```
$ ./cockroach node decommission 1 4 5 --checks=enabled --insecure --dry-run

  id | is_live | replicas | is_decommissioning | membership | is_draining |     readiness     | blocking_ranges
-----+---------+----------+--------------------+------------+-------------+-------------------+------------------
   1 |  true   |       53 |       false        |   active   |    false    | allocation errors |              47
   4 |  true   |       52 |       false        |   active   |    false    | allocation errors |              46
   5 |  true   |       54 |       false        |   active   |    false    | allocation errors |              48
(3 rows)

ranges blocking decommission detected

n1 has 34 replicas blocked with error: "0 of 1 live stores are able to take a new replica for the range (2 already have a voter, 0 already have a non-voter); likely not enough nodes in cluster"
n1 has 13 replicas blocked with error: "0 of 1 live stores are able to take a new replica for the range (2 already have a voter, 0 already have a non-voter); replicas must match constraints [{+node1:1} {+node4:1} {+node5:1}]; voting replicas must match voter_constraints []"
n4 has 13 replicas blocked with error: "0 of 1 live stores are able to take a new replica for the range (2 already have a voter, 0 already have a non-voter); replicas must match constraints [{+node1:1} {+node4:1} {+node5:1}]; voting replicas must match voter_constraints []"
n4 has 33 replicas blocked with error: "0 of 1 live stores are able to take a new replica for the range (2 already have a voter, 0 already have a non-voter); likely not enough nodes in cluster"
n5 has 35 replicas blocked with error: "0 of 1 live stores are able to take a new replica for the range (2 already have a voter, 0 already have a non-voter); likely not enough nodes in cluster"
...more blocking errors detected.

ERROR: Cannot decommission nodes.
Failed running "node decommission"
```

Fixes: #91893

Release note (cli change): `cockroach node decommission` operations now
preliminarily check the ability of the node to complete decommissioning,
given the cluster configuration and the ranges with replicas present
on the node. This step can be skipped by using the flag `--checks=skip`.
When errors are detected that would result in the inability to complete
node decommission, they will be printed to stderr and the command will
exit, instead of marking the node as `decommissioning` and beginning the
node decommission process. When the strict readiness evaluation mode
is used by setting the flag `--checks=strict`, any ranges that need any
preliminary actions prior to replacement for the decommission process
(e.g. ranges that are not yet fully upreplicated) will block the
decommission process.
  • Loading branch information
AlexTalks committed Mar 2, 2023
1 parent 98c39f7 commit cefa614
Show file tree
Hide file tree
Showing 13 changed files with 272 additions and 27 deletions.
20 changes: 20 additions & 0 deletions pkg/cli/cliflags/flags.go
Original file line number Diff line number Diff line change
Expand Up @@ -1251,6 +1251,26 @@ in the history of the cluster.`,
as target of the decommissioning or recommissioning command.`,
}

NodeDecommissionChecks = FlagInfo{
Name: "checks",
Description: `
Specifies how to evaluate readiness checks prior to node decommission.
Takes any of the following values:
<PRE>
- enabled evaluate readiness prior to starting node decommission.
- strict use strict readiness evaluation mode prior to node decommission.
- skip skip readiness checks and immediately request node decommission.
Use when rerunning node decommission.
</PRE>`,
}

NodeDecommissionDryRun = FlagInfo{
Name: "dry-run",
Description: `Only evaluate decommission readiness and check decommission
status, without actually decommissioning the node.`,
}

NodeDrainSelf = FlagInfo{
Name: "self",
Description: `Use the node ID of the node connected to via --host
Expand Down
4 changes: 4 additions & 0 deletions pkg/cli/context.go
Original file line number Diff line number Diff line change
Expand Up @@ -543,6 +543,8 @@ func setDrainContextDefaults() {
var nodeCtx struct {
nodeDecommissionWait nodeDecommissionWaitType
nodeDecommissionSelf bool
nodeDecommissionChecks nodeDecommissionCheckMode
nodeDecommissionDryRun bool
statusShowRanges bool
statusShowStats bool
statusShowDecommission bool
Expand All @@ -555,6 +557,8 @@ var nodeCtx struct {
func setNodeContextDefaults() {
nodeCtx.nodeDecommissionWait = nodeDecommissionWaitAll
nodeCtx.nodeDecommissionSelf = false
nodeCtx.nodeDecommissionChecks = nodeDecommissionChecksEnabled
nodeCtx.nodeDecommissionDryRun = false
nodeCtx.statusShowRanges = false
nodeCtx.statusShowStats = false
nodeCtx.statusShowAll = false
Expand Down
2 changes: 1 addition & 1 deletion pkg/cli/debug_recover_loss_of_quorum_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -256,7 +256,7 @@ func TestLossOfQuorumRecovery(t *testing.T) {
adminClient := serverpb.NewAdminClient(grpcConn)

require.NoError(t, runDecommissionNodeImpl(
ctx, adminClient, nodeDecommissionWaitNone,
ctx, adminClient, nodeDecommissionWaitNone, nodeDecommissionChecksSkip, false,
[]roachpb.NodeID{roachpb.NodeID(2), roachpb.NodeID(3)}, tcAfter.Server(0).NodeID()),
"Failed to decommission removed nodes")

Expand Down
4 changes: 4 additions & 0 deletions pkg/cli/flags.go
Original file line number Diff line number Diff line change
Expand Up @@ -700,6 +700,10 @@ func init() {
// Decommission command.
cliflagcfg.VarFlag(decommissionNodeCmd.Flags(), &nodeCtx.nodeDecommissionWait, cliflags.Wait)

// Decommission pre-check flags.
cliflagcfg.VarFlag(decommissionNodeCmd.Flags(), &nodeCtx.nodeDecommissionChecks, cliflags.NodeDecommissionChecks)
cliflagcfg.BoolFlag(decommissionNodeCmd.Flags(), &nodeCtx.nodeDecommissionDryRun, cliflags.NodeDecommissionDryRun)

// Decommission and recommission share --self.
for _, cmd := range []*cobra.Command{decommissionNodeCmd, recommissionNodeCmd} {
f := cmd.Flags()
Expand Down
41 changes: 41 additions & 0 deletions pkg/cli/flags_util.go
Original file line number Diff line number Diff line change
Expand Up @@ -267,6 +267,47 @@ func (s *nodeDecommissionWaitType) Set(value string) error {
return nil
}

type nodeDecommissionCheckMode int

const (
nodeDecommissionChecksSkip nodeDecommissionCheckMode = iota
nodeDecommissionChecksEnabled
nodeDecommissionChecksStrict
)

// Type implements the pflag.Value interface.
func (s *nodeDecommissionCheckMode) Type() string { return "string" }

// String implements the pflag.Value interface.
func (s *nodeDecommissionCheckMode) String() string {
switch *s {
case nodeDecommissionChecksSkip:
return "skip"
case nodeDecommissionChecksEnabled:
return "enabled"
case nodeDecommissionChecksStrict:
return "strict"
default:
panic("unexpected node decommission check mode (possible values: enabled, strict, skip)")
}
}

// Set implements the pflag.Value interface.
func (s *nodeDecommissionCheckMode) Set(value string) error {
switch value {
case "skip":
*s = nodeDecommissionChecksSkip
case "enabled":
*s = nodeDecommissionChecksEnabled
case "strict":
*s = nodeDecommissionChecksStrict
default:
return fmt.Errorf("invalid node decommission parameter: %s "+
"(possible values: enabled, strict, skip)", value)
}
return nil
}

// bytesOrPercentageValue is a flag that accepts an integer value, an integer
// plus a unit (e.g. 32GB or 32GiB) or a percentage (e.g. 32%). In all these
// cases, it transforms the string flag input into an int64 value.
Expand Down
2 changes: 1 addition & 1 deletion pkg/cli/interactive_tests/test_cluster_name.tcl
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ system "$argv init --insecure --host=localhost --cluster-name=foo"
end_test

start_test "Check that decommission works with a cluster-name provided"
send "$argv node decommission 1 --insecure --host=localhost --wait=none --cluster-name=foo\r"
send "$argv node decommission 1 --insecure --host=localhost --checks=skip --wait=none --cluster-name=foo\r"
end_test

start_test "Check that recommission works with cluster name verification disabled"
Expand Down
4 changes: 2 additions & 2 deletions pkg/cli/interactive_tests/test_multiple_nodes.tcl
Original file line number Diff line number Diff line change
Expand Up @@ -26,10 +26,10 @@ eexpect "warning: node 2 is not decommissioned"
eexpect eof

start_test "Check that a double decommission prints out a warning"
spawn $argv node decommission 2 --wait none
spawn $argv node decommission 2 --wait none --checks skip
eexpect eof

spawn $argv node decommission 2 --wait none
spawn $argv node decommission 2 --wait none --checks skip
eexpect "warning: node 2 is already decommissioning or decommissioned"
eexpect eof
end_test
Expand Down
Loading

0 comments on commit cefa614

Please sign in to comment.