Docs for decommissioning and removing nodes #1876

jseldess · 2017-09-01T20:27:13Z

Update docs on temporary stopping a node.
Add docs on decommissioning and permanent removal of nodes, as well as recommissioning.
Update cockroach node docs.
Update command overview and sidenav.

Fixes #1496
Fixes #97

cockroach-teamcity · 2017-09-01T20:27:20Z

This change is

cockroach-teamcity · 2017-09-01T20:27:53Z

http://cockroach-docs-review.s3-website-us-east-1.amazonaws.com/b2a273faad958b18f796bc981133a0cac4432b93/

jseldess · 2017-09-01T20:28:28Z

@tschottdorf, @bdarnell, I still have a bit of work to do, but I'd like your early feedback on how I've documented decommissioning and removing nodes. Please take a look at the changes to [stop-a-node.md and the new decommission-a-node.md file.

HTML versions:

tbg · 2017-09-03T15:27:33Z

Looks good! I don't fully understand where the docs sit in the greater scheme of things, but it seems that there's a bit of duplication that's likely to rot? Other than that, only two points:

./cockroach quit --decommission is essentially ./cockroach node decommission <self> && ./cockroach quit. That means you'll use it to decommission and then stop a node, it's not necessary to decommission it first.
Discussing the case in which multiple nodes are decommissioned would be good. It's more efficient to do them all at once than one after another to minimize data movement.
The diagrams are good!

Reviewed 21 of 21 files at r1.
Review status: all files reviewed at latest revision, 2 unresolved discussions, some commit checks failed.

_includes/cli/decommission-a-node.html, line 8 at r1 (raw file):

1. Confirm that there are enough nodes to take over the replicas from the node you want to remove. See [Considerations](decommission-a-node.html#consideration) for some example scenarios.

2. [Install the `cockroach` binary](install-cockroachdb.html) on a machine separate from the node.

This isn't necessary, can do this from one of the machines itself.

v1.1/stop-a-node.md, line 9 at r1 (raw file):

<span class="version-tag">Changed in v1.1:</span> This page shows you how to use the `cockroach quit` [command](cockroach-commands.html) to either temporarily stop a node that you plan to restart or permanently remove a node that has already been [decommissioned](decommission-a-node.html).

Generally, you temporarily stop nodes during the process of [upgrading your cluster's version of CockroachDB](upgrade-cockroach-version.html), whereas you permanently remove nodes when downsizing a cluster.

or reacting to hardware failures.

Comments from Reviewable

tbg · 2017-09-03T15:29:40Z

Oh, and perhaps a Considerations section that removes a node that had a hardware failure would be interesting (i.e. use --wait=live since the node is already dead).

bdarnell · 2017-09-03T18:37:22Z

Reviewed 21 of 21 files at r1.
Review status: all files reviewed at latest revision, 8 unresolved discussions, some commit checks failed.

_includes/cli/decommission-a-node.html, line 10 at r1 (raw file):

2. [Install the `cockroach` binary](install-cockroachdb.html) on a machine separate from the node.

3. Run the [`cockroach node status`](view-node-details.html) command and identify the ID of the node you want to remove:

If the node is up, it's often easier to ask it for its ID than to scan the node status output in a large cluster: cockroach sql --security-flags --host=<node-to-be-removed> -e 'show node_id'. The node ID is also printed when the node starts up.

v1.1/decommission-a-node.md, line 13 at r1 (raw file):

<div id="toc"></div>

## Considerations

This should also discuss what it means to decommission a node that's already down (i.e. that this is what you'd do to remove permanently dead nodes from the UI).

v1.1/recommission-a-node.md, line 3 at r1 (raw file):

---
title: Recommission Nodes
summary: Learn why and how to temporarily stop a CockroachDB node.

Recommissioning is not about temporarily stopping a node, it's only for undoing a (mistaken) decommission. I'd include it on the decommission page instead of giving it its own page.

v1.1/remove-a-node.md, line 2 at r1 (raw file):

---
title: Remove a Node

"Removing" a node implies permanent (decomissioning) removal to me, whereas "stop" is very strongly associated with a temporary stop. I'd swap this doc with the stop-a-node one, so "stop" describes the temporary quit process and "remove a node" is the high-level guide about the two options.

v1.1/remove-a-node.md, line 7 at r1 (raw file):

---

To stop a CockroachDB node running in the background, run the `cockroach quit` [command](cockroach-commands.html) with appropriate flags. To stop a node running in the foreground, use **CTRL + C** or run `cockroach quit` from another shell.

Sending a signal to the process is also a valid option (for both foreground and background processes). This is the mechanisms that most process managers would use.

v1.1/remove-a-node.md, line 9 at r1 (raw file):

To stop a CockroachDB node running in the background, run the `cockroach quit` [command](cockroach-commands.html) with appropriate flags. To stop a node running in the foreground, use **CTRL + C** or run `cockroach quit` from another shell.

The `quit` command allows in-flight requests to complete and then shuts down the node. Once a node has been offline for approximately 5 minutes, CockroachDB automatically rebalances replicas from the missing node, using unaffected replicas on other nodes as sources.

Not just the quit command - ctrl-c and signals also allow in-flight requests to complete.

Comments from Reviewable

tbg · 2017-09-05T17:29:04Z

Review status: all files reviewed at latest revision, 8 unresolved discussions, some commit checks failed.

_includes/cli/decommission-a-node.html, line 10 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

If the node is up, it's often easier to ask it for its ID than to scan the node status output in a large cluster: cockroach sql --security-flags --host=<node-to-be-removed> -e 'show node_id'. The node ID is also printed when the node starts up.

... or it's printed in the admin ui, if what you know is the host it's running on.

Note that the node may be dead, in which case they shouldn't try to talk to the node.

Comments from Reviewable

jseldess · 2017-09-05T18:33:05Z

TFTR, @tschottdorf and @bdarnell. Will rework soon.

jseldess · 2017-09-05T18:36:50Z

_includes/cli/decommission-a-node.html, line 8 at r1 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

This isn't necessary, can do this from one of the machines itself.

Hmm, you can ssh onto the node, it's true, but I think I've been told by @mberhault or @bdarnell that it's best to recommend running client commands from elsewhere?

Comments from Reviewable

jseldess · 2017-09-05T18:38:32Z

v1.1/decommission-a-node.md, line 13 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

This should also discuss what it means to decommission a node that's already down (i.e. that this is what you'd do to remove permanently dead nodes from the UI).

In that case, do you just run the cockroach node decommission command and the UI will catch on?

Comments from Reviewable

jseldess · 2017-09-05T18:39:00Z

v1.1/recommission-a-node.md, line 3 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Recommissioning is not about temporarily stopping a node, it's only for undoing a (mistaken) decommission. I'd include it on the decommission page instead of giving it its own page.

Sorry. This is just a stub page with incorrect copy. I'll remove it and add this content to the decommission page, as you suggest.

Comments from Reviewable

jseldess · 2017-09-05T18:39:59Z

v1.1/remove-a-node.md, line 2 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

"Removing" a node implies permanent (decomissioning) removal to me, whereas "stop" is very strongly associated with a temporary stop. I'd swap this doc with the stop-a-node one, so "stop" describes the temporary quit process and "remove a node" is the high-level guide about the two options.

Again, sorry. This is just a stub I left in place accidentally. I think I'll try to have one page, Stop or Remove a Node, cover both cases.

Comments from Reviewable

bdarnell · 2017-09-05T18:43:26Z

Review status: all files reviewed at latest revision, 8 unresolved discussions, some commit checks failed.

v1.1/decommission-a-node.md, line 13 at r1 (raw file):

Previously, jseldess wrote…

In that case, do you just run the cockroach node decommission command and the UI will catch on?

Yes. (Just don't use --wait=all, or it won't finish)

Comments from Reviewable

tbg · 2017-09-05T19:29:10Z

Review status: all files reviewed at latest revision, 8 unresolved discussions, some commit checks failed.

_includes/cli/decommission-a-node.html, line 8 at r1 (raw file):

Previously, jseldess wrote…

Hmm, you can ssh onto the node, it's true, but I think I've been told by @mberhault or @bdarnell that it's best to recommend running client commands from elsewhere?

Serious deployments would likely have a controller host, but generally I don't think it's necessary. @mberhault and @bdarnell are definitely the authority on what we want to recommend though.

Comments from Reviewable

bdarnell · 2017-09-06T04:20:21Z

Review status: all files reviewed at latest revision, 8 unresolved discussions, some commit checks failed.

_includes/cli/decommission-a-node.html, line 8 at r1 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

Serious deployments would likely have a controller host, but generally I don't think it's necessary. @mberhault and @bdarnell are definitely the authority on what we want to recommend though.

In general, I think it's fine for our instructions to demonstrate running the command on a node; we don't need to be didactic about this every time. However, because decommission is a command that is intended to be used some of the times on a downed node, it's probably a good idea to demonstrate this command on a node other than the one to be decommissioned.

Comments from Reviewable

cockroach-teamcity · 2017-09-07T19:36:01Z

http://cockroach-docs-review.s3-website-us-east-1.amazonaws.com/66fd90c516da38cc230c3fcfa1178cdcb24e6d19/

jseldess · 2017-09-07T19:36:56Z

@tschottdorf and @bdarnell, please take another look.

stop-a-node.md now focuses on temporary stopping.
remove-a-node.md now focuses on decommissioning and node removal.
I expanded view-node-details.md to cover the decommission and recommission subcommands and flags. In a follow-up PR, I'll add more details about the response fields for those commands.

jseldess · 2017-09-07T19:38:20Z

Review status: 12 of 28 files reviewed at latest revision, 8 unresolved discussions, some commit checks pending.

v1.1/stop-a-node.md, line 9 at r1 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

or reacting to hardware failures.

Done.

_includes/cli/decommission-a-node.html, line 8 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

In general, I think it's fine for our instructions to demonstrate running the command on a node; we don't need to be didactic about this every time. However, because decommission is a command that is intended to be used some of the times on a downed node, it's probably a good idea to demonstrate this command on a node other than the one to be decommissioned.

Done.

_includes/cli/decommission-a-node.html, line 10 at r1 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

... or it's printed in the admin ui, if what you know is the host it's running on.

Note that the node may be dead, in which case they shouldn't try to talk to the node.

Using both of these methods now, in different places.

v1.1/decommission-a-node.md, line 13 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Yes. (Just don't use --wait=all, or it won't finish)

Done.

Comments from Reviewable

cockroach-teamcity · 2017-09-07T20:27:48Z

http://cockroach-docs-review.s3-website-us-east-1.amazonaws.com/f9e354220631e477ae8cfd3c8aae40822c2cb5a8/

bdarnell · 2017-09-07T22:29:07Z

Reviewed 14 of 21 files at r2, 2 of 2 files at r3.
Review status: all files reviewed at latest revision, 8 unresolved discussions, all commit checks successful.

Comments from Reviewable

jseldess · 2017-09-08T16:38:04Z

Decided to add descriptions for fields in cockroach node subcommand responses.

cockroach-teamcity · 2017-09-08T16:38:09Z

http://cockroach-docs-review.s3-website-us-east-1.amazonaws.com/5ccb38e5482e125cc12f12b81f3e6d6c33f2d8cc/

cockroach-teamcity · 2017-09-11T02:21:45Z

http://cockroach-docs-review.s3-website-us-east-1.amazonaws.com/f7bce20d880a1ac5ef36200f409ae02b3b504f5d/

jseldess requested a review from tbg September 1, 2017 20:27

jseldess force-pushed the decommission-nodes branch from b2a273f to 66fd90c Compare September 7, 2017 19:34

jseldess changed the title ~~[WIP] Docs for decommissioning and removing nodes~~ Docs for decommissioning and removing nodes Sep 7, 2017

Jesse Seldess added 2 commits September 10, 2017 22:17

Docs for decommissioning and removing nodes

2717641

Updating cockroach node response fields

f7bce20

jseldess force-pushed the decommission-nodes branch from 5ccb38e to f7bce20 Compare September 11, 2017 02:17

jseldess merged commit 84dd0b5 into master Sep 11, 2017

jseldess deleted the decommission-nodes branch September 11, 2017 02:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Docs for decommissioning and removing nodes #1876

Docs for decommissioning and removing nodes #1876

jseldess commented Sep 1, 2017 •

edited

Loading

cockroach-teamcity commented Sep 1, 2017

cockroach-teamcity commented Sep 1, 2017

jseldess commented Sep 1, 2017 •

edited

Loading

tbg commented Sep 3, 2017

tbg commented Sep 3, 2017

bdarnell commented Sep 3, 2017

tbg commented Sep 5, 2017

jseldess commented Sep 5, 2017

jseldess commented Sep 5, 2017

jseldess commented Sep 5, 2017

jseldess commented Sep 5, 2017

jseldess commented Sep 5, 2017

bdarnell commented Sep 5, 2017

tbg commented Sep 5, 2017

bdarnell commented Sep 6, 2017

cockroach-teamcity commented Sep 7, 2017

jseldess commented Sep 7, 2017

jseldess commented Sep 7, 2017

cockroach-teamcity commented Sep 7, 2017

bdarnell commented Sep 7, 2017

jseldess commented Sep 8, 2017

cockroach-teamcity commented Sep 8, 2017

cockroach-teamcity commented Sep 11, 2017

Docs for decommissioning and removing nodes #1876

Docs for decommissioning and removing nodes #1876

Conversation

jseldess commented Sep 1, 2017 • edited Loading

cockroach-teamcity commented Sep 1, 2017

cockroach-teamcity commented Sep 1, 2017

jseldess commented Sep 1, 2017 • edited Loading

tbg commented Sep 3, 2017

tbg commented Sep 3, 2017

bdarnell commented Sep 3, 2017

tbg commented Sep 5, 2017

jseldess commented Sep 5, 2017

jseldess commented Sep 5, 2017

jseldess commented Sep 5, 2017

jseldess commented Sep 5, 2017

jseldess commented Sep 5, 2017

bdarnell commented Sep 5, 2017

tbg commented Sep 5, 2017

bdarnell commented Sep 6, 2017

cockroach-teamcity commented Sep 7, 2017

jseldess commented Sep 7, 2017

jseldess commented Sep 7, 2017

cockroach-teamcity commented Sep 7, 2017

bdarnell commented Sep 7, 2017

jseldess commented Sep 8, 2017

cockroach-teamcity commented Sep 8, 2017

cockroach-teamcity commented Sep 11, 2017

jseldess commented Sep 1, 2017 •

edited

Loading

jseldess commented Sep 1, 2017 •

edited

Loading