server,rpc: block decommissioned node from interacting with the cluster #55286

tbg · 2020-10-07T14:37:02Z

This PR completes the work started in #54936 by introducing a component that
keeps track of decommissioned nodes and hooks it up to the RPC layer, where it
is used to prevent RPC-layer heartbeats between the cluster and decommissioned
nodes. Since these heartbeats are a prerequisite for opening a connection in
either direction, by failing these heartbeats we are effectively preventing
decommissioned nodes from interacting with the cluster.

First commits from #54936. New here:

keys: introduce NodeTombstoneKey
server: introduce node tombstone storage

Release note: None

cockroach-teamcity · 2020-10-07T14:37:20Z

This change is

tbg · 2020-10-13T08:32:16Z

Friendly ping @irfansharif

irfansharif

, but while I have you here I had a few questions. Is the following hazard possible? I think it is (though I'm less clear if we care about it, and if so, why.)

Cluster comprised of n1, n2, n3
User decommissions n3 through n2
n2 updates n3's liveness record in KV, marking it as fully decommissioned
n2 crashes before invoking OnNodeDecommissioned, so it doesn't place the n3 marker on disk
n2 restarts, and not finding anything on disk to suggest otherwise, connects to n3 (it still hasn't learned through gossip that n3 is decommissioned
???

What does this mean for long running migrations? If the orchestrator process (say on n1) is looking to push out feature gates to n2, if n2 still has an open connection to n3 (admittedly the window where this is possible is very short), isn't that not the kind of guarantee want? I'm actually unclear on what happens if the connection to n3 is maintained.

Given the authoritative source of "all decommissioned nodes" is still KV (as opposed to store local keys). If the open connection to n3 is unacceptable (again, unclear on the implications), would we need the orchestrator to inform all live nodes about the current set of decomm'ed nodes? And then wait for each one to guarantee that they've persisted this store local key + disconnected any relevant open connections?

Reviewed 4 of 4 files at r8, 4 of 5 files at r9.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @irfansharif and @tbg)

pkg/keys/printer.go, line 209 at r8 (raw file):

		return fmt.Sprintf("<invalid: %s>", err)
	}
	return fmt.Sprint(nodeID)

I'm not sure what's custom here in pkg/keys, but what about "n"?

pkg/rpc/context.go, line 1207 at r9 (raw file):

					checkVersion(goCtx, ctx.Settings, response.ServerVersion),
					"version compatibility check failed on ping response")
				if err != nil {

This is always true.

pkg/server/connectivity_test.go, line 363 at r9 (raw file):

	defer tc.Stopper().Stop(ctx)

	//tc.StopServer(2) // stop n3

Stray line?

pkg/server/node_tombstone_storage.go, line 118 at r9 (raw file):

	// We've populated the cache, now write through to disk.

[nit] Remove empty lines here and right below.

tbg

TFTR!

re: your comment, yes, the blocklist is best-effort. We discussed this somewhere on the RFC PR (I believe around here), and the outcome was that basically the operator is ultimately still in charge of not bringing back nodes that are not allowed to come back, though we will make it really hard for them to break something in that way (because the decommissioned status usually blocks the nodes). So the question is whether we're "done" after this PR or if we ought to do more. One simple safety hatch is that the orchestrator, upon looking at the liveness table, makes sure any DECOMMISSIONING nodes are not live. We could also push out decommission bits proactively, etc, and actively force a pass through the open connections in the RPC layer, but that seems more trouble than is worth.

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @irfansharif)

pkg/keys/printer.go, line 209 at r8 (raw file):

Previously, irfansharif (irfan sharif) wrote…

I'm not sure what's custom here in pkg/keys, but what about "n"?

I'm not sure what you're suggesting, can you clarify?

pkg/rpc/context.go, line 1207 at r9 (raw file):

Previously, irfansharif (irfan sharif) wrote…

This is always true.

No, errors.Wrap(nil, _) is nil.

pkg/server/connectivity_test.go, line 363 at r9 (raw file):

Previously, irfansharif (irfan sharif) wrote…

Stray line?

Done.

pkg/server/node_tombstone_storage.go, line 118 at r9 (raw file):

Previously, irfansharif (irfan sharif) wrote…

[nit] Remove empty lines here and right below.

Done.

These keys live in the store-local keyspace and contain nodes known to have been marked as decommissioned (=permanently removed from the cluster). These keys will be used to prevent tombstoned nodes from establishing RPC connections to the cluster (and vice versa). Release note: None

This introduces the component that provides a cached view of the locally persisted node tombstones. This component is populated via NodeLiveness (which in turn listens to Gossip) and queried in the RPC layer, meaning that in effect, after the gossip info has propagated out, nodes will, within a heartbeat timeout (a few seconds) cease communication with the decommissioned node. I verified manually that the refusal message makes it to the logs on the decommissioned node. Some changes to the RPC connection pool were necessary to get the desired behavior. Essentially, the pool holds on to and hands out existing connections by default, meaning that the injected errors would be ignored if they started occurring on an already active connection (as in `TestDecommissionedNodeCannotConnect`). We need to make sure the heartbeat loop actually *returns* these injected errors, which is now the case. Release note: None

Release note: None

tbg · 2020-10-14T13:10:25Z

bors r=irfansharif

craig · 2020-10-14T14:01:58Z

Build succeeded:

GitHub CI (Cockroach)

irfansharif · 2020-10-14T14:09:51Z

I'm not sure what you're suggesting, can you clarify?

Just meant pretty printing the node ID as "n123" instead of just 123. Eh, wtvs.

tbg · 2020-10-14T15:41:03Z

👍 opened new PR for that.

The test could end up using fully decommissioned nodes for cli commands, which does not work as of cockroachdb#55286. Additionally, decommissioned nodes now become non-live after a short while, so various cli output checks had to be adjusted. Fixes cockroachdb#55581. Release note: None

55560: backupccl: avoid div-by-zero crash on failed node count r=dt a=dt We've seen a report of a node that crashed due to a divide-by-zero hit during metrics collection, specifically when computing the throughput-per-node by dividing the backup size by node count. Since this is only now used for that metric, make a failure to count nodes a warning only for release builds (and fallback to 1), and make any error while counting, or not counting to more than 0, a returned error in non-release builds. Release note (bug fix): avoid crashing when BACKUP is unable to count the total nodes in the cluster. 55809: roachtest: fix decommission/randomize r=irfansharif a=tbg The test could end up using fully decommissioned nodes for cli commands, which does not work as of #55286. Fixes #55581. Release note: None 56019: lexbase: pin `reserved_keywords.go` within Bazel r=irfansharif a=irfansharif It's an auto-generated file that bazel doesn't yet know how to construct within the sandbox. Before this PR `make bazel-generate` would show spurious diffs on a clean checkout without this file present. Now it no longer will. Unfortunately it also means that successful bazel builds require `reserved_keywords.go` being pre-generated ahead of time (it's not checked-in into the repo). Once Bazel is taught to generate this file however, this will no longer be the case. It was just something that I missed in #55687. Release note: None Co-authored-by: David Taylor <[email protected]> Co-authored-by: Tobias Grieger <[email protected]> Co-authored-by: irfan sharif <[email protected]>

The test could end up using fully decommissioned nodes for cli commands, which does not work as of cockroachdb#55286. Additionally, decommissioned nodes now become non-live after a short while, so various cli output checks had to be adjusted. Fixes cockroachdb#55581. Release note: None

tbg requested a review from irfansharif October 7, 2020 14:37

tbg force-pushed the decom-bit-rpc-step2 branch from dca9b45 to 168c490 Compare October 8, 2020 10:27

irfansharif approved these changes Oct 13, 2020

View reviewed changes

tbg requested a review from irfansharif October 14, 2020 09:23

tbg commented Oct 14, 2020

View reviewed changes

tbg added 3 commits October 14, 2020 12:38

rpc: rename ping interceptors for clarity

40cdd5a

Release note: None

tbg force-pushed the decom-bit-rpc-step2 branch from 168c490 to 40cdd5a Compare October 14, 2020 10:38

craig bot merged commit 654a061 into cockroachdb:master Oct 14, 2020

tbg deleted the decom-bit-rpc-step2 branch October 14, 2020 15:41

tbg mentioned this pull request Oct 20, 2020

roachtest: decommission/randomized failed #55581

Closed

tbg mentioned this pull request Oct 21, 2020

roachtest: fix decommission/randomize #55809

Merged

irfansharif mentioned this pull request Oct 29, 2020

*: introduce pkg/migrations #56107

Closed

tbg mentioned this pull request Nov 4, 2020

release-20.2: roachtest: fix decommission/randomize #56277

Merged

a-robinson mentioned this pull request Jun 21, 2021

No programmatic way to determine if a given data directory has been decommissioned on v21.1? #66658

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

server,rpc: block decommissioned node from interacting with the cluster #55286

server,rpc: block decommissioned node from interacting with the cluster #55286

tbg commented Oct 7, 2020

cockroach-teamcity commented Oct 7, 2020

tbg commented Oct 13, 2020

irfansharif left a comment

tbg left a comment

tbg commented Oct 14, 2020

craig bot commented Oct 14, 2020

irfansharif commented Oct 14, 2020

tbg commented Oct 14, 2020

server,rpc: block decommissioned node from interacting with the cluster #55286

server,rpc: block decommissioned node from interacting with the cluster #55286

Conversation

tbg commented Oct 7, 2020

cockroach-teamcity commented Oct 7, 2020

tbg commented Oct 13, 2020

irfansharif left a comment

Choose a reason for hiding this comment

tbg left a comment

Choose a reason for hiding this comment

tbg commented Oct 14, 2020

craig bot commented Oct 14, 2020

irfansharif commented Oct 14, 2020

tbg commented Oct 14, 2020