cli: add command or flag to actively remove node from cluster #6198

cuongdo · 2016-04-21T15:43:29Z

We need some kind of command or flag to actively remove a node from the cluster when there's no intention to bring that node back up.

At a minimum, this would:

Ensure that the node not show up in the admin UI in the future
Migrate replicas off the stores on that node in a clean way
Prevent the stream of log entries that occur when you currently kill a node. For example:

I160421 11:32:01.362135 /Users/cdo/go/src/google.golang.org/grpc/clientconn.go:317  Failed to dial cdo-cockroach.local:26259: grpc: timed out trying to connect; please retry.
W160421 11:32:10.651793 ts/db.go:98  error writing time series data: failed to send RPC: too many errors encountered (1 of 1 total): rpc error: code = 4 desc = context deadline exceeded

Shut down the CockroachDB instance on the node.
Probably log an entry in the event log

The text was updated successfully, but these errors were encountered:

tbg · 2016-04-21T15:52:30Z

This somewhat decomposes into a few more granular features:

drain a node (i.e. let all leader leases expire and don't obtain new
ones, disable all queues, maybe prevent new SQL connections - but keep
making Raft progress) and wait until the slate is clean
Used for clean shutdown (and pretty much always part of any type of
shutdown)
drain data off a node (i.e. have all or some subset of ranges migrate
away, and wait until that has happened)
Used for decommissioning, but could also be used to change hard-drives
cleanly (i.e. drain data off a store).
drain for update/migration (see the {F,Unf}reeze proposal in [RFC] Proposer-evaluted KV #6166)
and finally
decommission (permanently remove), which is "most of the above".

On Thu, Apr 21, 2016 at 11:43 AM Cuong Do [email protected] wrote:

We need some kind of command or flag to actively remove a node from the
cluster when there's no intention to bring that node back up.

At a minimum, this would:

Ensure that the node not show up in the admin UI in the future

Migrate replicas off the stores on that node in a clean way

Prevent the stream of log entries that occur when you currently kill
a node. For example:

I160421 11:32:01.362135 /Users/cdo/go/src/google.golang.org/grpc/clientconn.go:317 Failed to dial cdo-cockroach.local:26259: grpc: timed out trying to connect; please retry.
W160421 11:32:10.651793 ts/db.go:98 error writing time series data: failed to send RPC: too many errors encountered (1 of 1 total): rpc error: code = 4 desc = context deadline exceeded

Shut down the CockroachDB instance on the node.

Probably log an entry in the event log

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#6198

-- Tobias

@BramGruneir

Whether we ultimately want a command like that is not determined (see cockroachdb#6197), but the code isn't currently working correctly (cockroachdb#6196) and it's set up in the legacy way, which hinders the upcoming migration of the `quit` command. A re-implementation of the code would likely want to erase the data on the server side (and not from the cli). cc @BramGruneir cc @jseldess for documentation Closes cockroachdb#6197, closes cockroachdb#6198.

@BramGruneir

Whether we ultimately want a command like that is not determined (see cockroachdb#6197), but the code isn't currently working correctly (cockroachdb#6196) and it's set up in the legacy way, which hinders the upcoming migration of the `quit` command. A re-implementation of the code would likely want to erase the data on the server side (and not from the cli). cc @BramGruneir cc @jseldess for documentation Closes cockroachdb#6197, closes cockroachdb#6198.

kwando · 2016-10-10T11:37:33Z

So... there is no way of retire a node in a cluster?

tbg · 2016-10-10T11:39:53Z

There is (or at least should be), but this issue was closed prematurely.

kwando · 2016-10-10T11:41:09Z

That's what I figured =)

tbg · 2016-10-10T14:17:21Z

Yep, thanks for pointing this out.

More precisely, here are intermediate steps for draining a cluster:

use ZoneConfigs to initiate the data of the node moving elsewhere
wait until that has happened
turn off the node.

The first two steps are currently too manual, especially the waiting. Once we have them figured out, it might be worth to package the result into a cli command.

petermattis · 2016-10-10T14:26:09Z

Currently ZoneConfigs are the only way to drain a node, but I think that is an inappropriate use. We really want a simpler mechanism to mark a node as draining.

tbg · 2016-10-10T14:36:17Z

Why is that an inappropriate use? ZoneConfigs describe what data is where and how often, and when I want all data data of a node to move somewhere else, it seems like exactly the appropriate use - building a second mechanism for a special case of that seems unwise.

Where value can be added is in making that process simple and intuitive, but nothing is required there that we don't want anyway. For example, we certainly want a tool to be able to tell you whether a given replication zone has all of its constraints satisfied (i.e. is everything replicated in the right way), the high level version of that giving you the cluster's overall status. This tool would tell you exactly when it's safe to take that node you're trying to get rid of down.
Also, an explicit remove command can announce to the cluster that the node isn't coming back, which can avoid triggering certain alerts.

I agree that it's not clear how the ZoneConfig would in general support the removal of nodes (it seems easy enough to do in any standard situation, but harder if the zone config is completely trivial or if there are a gazillion zones). What else did you have in mind?

petermattis · 2016-10-10T14:53:00Z

ZoneConfigs are settable by the user and there might be dozens or hundreds. Changing all of them in order to drain a node doesn't seem reasonable. Conceptually, what we'd want is something where there is an implicit -draining attribute on every ZoneConfig and marking a node as draining adds that attribute. We'd still go through the standard allocator/rebalancing logic. Re-reading this, perhaps we're saying the same thing.

tbg · 2016-10-10T15:02:19Z

Ok, sure sounds like it. Glad we're on the same page.

kaeverens · 2016-12-07T16:00:24Z

any progress on this? I'd like to cleanly remove a server from my cluster.

in the meantime, is it safe to simply turn one of the servers off? would the others eventually give up on it and stop trying to talk to it?

kaeverens · 2016-12-07T16:02:58Z

as a note, I see some work an RFC on this:
https://github.com/cockroachdb/cockroach/blob/master/docs/RFCS/drain_modes.md

bdarnell · 2016-12-08T09:04:18Z

Yes, it's safe to simply turn one of the servers off. The data will be re-replicated onto other nodes.

spencerkimball · 2017-03-30T21:53:56Z

I'm moving this off of 1.0.

@neeral

Ignore the first commit -- that's cockroachdb#16968. See cockroachdb#6198. This implements the "management portion" of It leans heavily on @neeral's WIP PR cockroachdb#17157. - add `node decommission [--wait=all|live|none] [nodeID1 nodeID2 ...]` - add `cockroach quit --decommission` - add a comprehensive acceptance test that puts all of these to good use. It works surprisingly well but as you'd expect there are kinks. Specificially, in the acceptance test, the invocation `quit --decommission` tends to hang for extended periods of time, sometimes "forever". In the most recent run, this was traced to the fact that the lease holder for a replica remaining on a decommissioning node had *no* ZoneConfig in gossip, which effectively disables its leaseholder replication checks. It is not clear whether this is related to decommissioning in the first place, though the leaseholder node was itself decommissioned, recommissioned and restarted when this occured. The acceptance test also requires at least four nodes to work, and it takes around 10 minutes, so that we may only want to run a reduced version during regular CI, with the long one running nightly. The invocation for the failing acceptance test is: ``` make acceptance TESTS=Decom TESTFLAGS='-v -show-logs -nodes=4' TESTTIMEOUT=20m ``` (if the test runs and fails with the localcluster shim complaining about unexpected events, then that's because I haven't had a chance to tell it about the node we're intentionally `--quit`ting yet) or rather, test what I did to tell it about that. cc @a-robinson Closes cockroachdb#17157.

@neeral

Ignore the first commit -- that's cockroachdb#16968. See cockroachdb#6198. This implements the "management portion" of It leans heavily on @neeral's WIP PR cockroachdb#17157. - add `node decommission [--wait=all|live|none] [nodeID1 nodeID2 ...]` - add `cockroach quit --decommission` - add a comprehensive acceptance test that puts all of these to good use. It works surprisingly well but as you'd expect there are kinks. Specificially, in the acceptance test, the invocation `quit --decommission` tends to hang for extended periods of time, sometimes "forever". In the most recent run, this was traced to the fact that the lease holder for a replica remaining on a decommissioning node had *no* ZoneConfig in gossip, which effectively disables its leaseholder replication checks. It is not clear whether this is related to decommissioning in the first place, though the leaseholder node was itself decommissioned, recommissioned and restarted when this occured. The acceptance test also requires at least four nodes to work, and it takes around 10 minutes, so that we may only want to run a reduced version during regular CI, with the long one running nightly. The invocation for the failing acceptance test is: ``` make acceptance TESTS=Decom TESTFLAGS='-v -show-logs -nodes=4' TESTTIMEOUT=20m ``` (if the test runs and fails with the localcluster shim complaining about unexpected events, then that's because I haven't had a chance to tell it about the node we're intentionally `--quit`ting yet) or rather, test what I did to tell it about that. cc @a-robinson Closes cockroachdb#17157.

@neeral

Ignore the first commit -- that's cockroachdb#16968. See cockroachdb#6198. This implements the "management portion" of It leans heavily on @neeral's WIP PR cockroachdb#17157. - add `node decommission [--wait=all|live|none] [nodeID1 nodeID2 ...]` - add `cockroach quit --decommission` - add a comprehensive acceptance test that puts all of these to good use. It works surprisingly well but as you'd expect there are kinks. Specificially, in the acceptance test, the invocation `quit --decommission` tends to hang for extended periods of time, sometimes "forever". In the most recent run, this was traced to the fact that the lease holder for a replica remaining on a decommissioning node had *no* ZoneConfig in gossip, which effectively disables its leaseholder replication checks. It is not clear whether this is related to decommissioning in the first place, though the leaseholder node was itself decommissioned, recommissioned and restarted when this occured. The acceptance test also requires at least four nodes to work, and it takes around 10 minutes, so that we may only want to run a reduced version during regular CI, with the long one running nightly. The invocation for the failing acceptance test is: ``` make acceptance TESTS=Decom TESTFLAGS='-v -show-logs -nodes=4' TESTTIMEOUT=20m ``` (if the test runs and fails with the localcluster shim complaining about unexpected events, then that's because I haven't had a chance to tell it about the node we're intentionally `--quit`ting yet) or rather, test what I did to tell it about that. cc @a-robinson Closes cockroachdb#17157.

@neeral

Ignore the first commit -- that's cockroachdb#16968. See cockroachdb#6198. This implements the "management portion" of It leans heavily on @neeral's WIP PR cockroachdb#17157. - add `node decommission [--wait=all|live|none] [nodeID1 nodeID2 ...]` - add `cockroach quit --decommission` - add a comprehensive acceptance test that puts all of these to good use. It works surprisingly well but as you'd expect there are kinks. Specificially, in the acceptance test, the invocation `quit --decommission` tends to hang for extended periods of time, sometimes "forever". In the most recent run, this was traced to the fact that the lease holder for a replica remaining on a decommissioning node had *no* ZoneConfig in gossip, which effectively disables its leaseholder replication checks. It is not clear whether this is related to decommissioning in the first place, though the leaseholder node was itself decommissioned, recommissioned and restarted when this occured. The acceptance test also requires at least four nodes to work, and it takes around 10 minutes, so that we may only want to run a reduced version during regular CI, with the long one running nightly. The invocation for the failing acceptance test is: ``` make acceptance TESTS=Decom TESTFLAGS='-v -show-logs -nodes=4' TESTTIMEOUT=20m ``` (if the test runs and fails with the localcluster shim complaining about unexpected events, then that's because I haven't had a chance to tell it about the node we're intentionally `--quit`ting yet) or rather, test what I did to tell it about that. cc @a-robinson Closes cockroachdb#17157.

EamonZhang · 2017-08-07T09:13:31Z

mark

tbg · 2017-08-07T11:21:39Z

Remaining here is the ui work (hiding nodes that are successfully decommissioned in the ui, at least after a while) and adding an event (that shows up in the ui) when a node is decommissioning/decommissioned.

See the last commit in #17419 for beginnings of this.

We'll also want another pass over the cli commands (#17419) before writing the documentation.

cc @jseldess

jseldess · 2017-08-07T12:13:17Z

Thanks for the update, @tschottdorf.

cuongdo · 2017-08-07T12:31:58Z

@couchand can you help @tschottdorf get up to speed with the UI changes?

@tschottdorf we're spreading out UI work, because we don't have nearly enough engineers conversant with the front-end to centralize admin UI development for new features. I think this should be a gentle intro given the minimal admin UI change needed for this

tbg · 2017-08-07T15:13:16Z

@cuongdo sgtm, but note that I still have version migrations on my plate and so I won't get to this for perhaps another two weeks.

couchand · 2017-08-07T23:55:18Z

@tschottdorf @cuongdo I'm happy to take a few minutes this week to talk through what will be needed here (but it has to be this week, I'm out for a month after Thursday...).

Moving forward, let's work to get the implementation of UI pieces of major features started well in advance of feature freeze, to be sure that there's plenty of time for the design to bake. And ideally let's try to include some implementation details in RFCs where possible.

cuongdo · 2017-08-08T19:33:43Z

@benesch has agreed to work on the UI component for this

tbg · 2017-08-08T20:45:14Z

Thanks @benesch, appreciate it! You'll want to look at the ui-related commit in https://github.com/cockroachdb/cockroach/pull/17157/commits.

tbg · 2017-08-31T00:30:19Z

I'll call this done, since the only remaining item is tracked in #17677, though the test is flaky (#17995).

cuongdo added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Apr 21, 2016

tbg mentioned this issue Apr 21, 2016

cli: remove exterminate #6197

Closed

tbg mentioned this issue May 18, 2016

cli: remove exterminate #6780

Merged

tbg closed this as completed in #6780 May 18, 2016

tbg reopened this Oct 10, 2016

tbg added this to the Q3 milestone Oct 10, 2016

dianasaur323 modified the milestones: 1.0, Q3 Feb 9, 2017

cuongdo assigned asubiotto Feb 23, 2017

spencerkimball modified the milestones: Later, 1.0 Mar 30, 2017

tbg assigned tbg and unassigned asubiotto May 31, 2017

petermattis modified the milestones: 1.1, Later Jun 1, 2017

tbg mentioned this issue Jul 27, 2017

server: hook up and test node decommissioning #17272

Merged

jseldess mentioned this issue Aug 7, 2017

core: Decommissioning a node cockroachdb/docs#1554

Closed

jseldess mentioned this issue Aug 8, 2017

core: Decommissioning a node cockroachdb/docs#1496

Closed

benesch mentioned this issue Aug 9, 2017

ui: handle decommissioned nodes #17553

Merged

dianasaur323 mentioned this issue Aug 16, 2017

ui: add de / re commissioning nodes to event log #17677

Closed

tbg closed this as completed Aug 31, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cli: add command or flag to actively remove node from cluster #6198

cli: add command or flag to actively remove node from cluster #6198

cuongdo commented Apr 21, 2016

tbg commented Apr 21, 2016

kwando commented Oct 10, 2016

tbg commented Oct 10, 2016

kwando commented Oct 10, 2016

tbg commented Oct 10, 2016

petermattis commented Oct 10, 2016

tbg commented Oct 10, 2016

petermattis commented Oct 10, 2016

tbg commented Oct 10, 2016

kaeverens commented Dec 7, 2016

kaeverens commented Dec 7, 2016 •

edited

Loading

bdarnell commented Dec 8, 2016

spencerkimball commented Mar 30, 2017

EamonZhang commented Aug 7, 2017

tbg commented Aug 7, 2017

jseldess commented Aug 7, 2017

cuongdo commented Aug 7, 2017

tbg commented Aug 7, 2017

couchand commented Aug 7, 2017

cuongdo commented Aug 8, 2017

tbg commented Aug 8, 2017

tbg commented Aug 31, 2017

cli: add command or flag to actively remove node from cluster #6198

cli: add command or flag to actively remove node from cluster #6198

Comments

cuongdo commented Apr 21, 2016

tbg commented Apr 21, 2016

kwando commented Oct 10, 2016

tbg commented Oct 10, 2016

kwando commented Oct 10, 2016

tbg commented Oct 10, 2016

petermattis commented Oct 10, 2016

tbg commented Oct 10, 2016

petermattis commented Oct 10, 2016

tbg commented Oct 10, 2016

kaeverens commented Dec 7, 2016

kaeverens commented Dec 7, 2016 • edited Loading

bdarnell commented Dec 8, 2016

spencerkimball commented Mar 30, 2017

EamonZhang commented Aug 7, 2017

tbg commented Aug 7, 2017

jseldess commented Aug 7, 2017

cuongdo commented Aug 7, 2017

tbg commented Aug 7, 2017

couchand commented Aug 7, 2017

cuongdo commented Aug 8, 2017

tbg commented Aug 8, 2017

tbg commented Aug 31, 2017

kaeverens commented Dec 7, 2016 •

edited

Loading