server: declare node ready while decommissioning #43889

andreimatei · 2020-01-10T23:43:39Z

Before this patch, a decommissioning or draining, but otherwise healthy,
node would return an error from the /health?ready=1 endpoint, thus
declaring itself unready and signaling load balancers to send traffic
away. This patch makes it so that the decommissioning status no longer
matters for the readyness determination. Draining nodes continue to
declare themselves unready.
Note that decommissioning nodes typically go through draining at the
end of the process.

The justification is that a node can be decommissioning for an arbitrary
amount of time. During that time it can continue to hold leases, etc. So
trying to avoid traffic is not particularly desirable. In fact, some
people even want to keep a node in a decommissioning state indefinitely
(by restarting a decommissioning node without recommissioning it).
Also, the server.shutdown.drain_wait cluster setting is there to give
load balancers ample time to find out about a draining node.

Also, tactically, the code is simplified.

Release note (general change): A node no longer declares itself to not
be ready through the /health/ready=1 endpoint while it's in the process
of decommissioning. It continues to declare itself unready while
draining.

cockroach-teamcity · 2020-01-10T23:43:46Z

This change is

andreimatei · 2020-01-10T23:44:28Z

started a discussion about the change here
https://groups.google.com/forum/#!topic/cockroach-db/P3v85IJIycE

andreimatei · 2020-01-14T00:35:12Z

I got a nod from @asubiotto on the mailing list, so I think this is good to go

Before this patch, a decommissioning or draining, but otherwise healthy, node would return an error from the /health?ready=1 endpoint, thus declaring itself unready and signaling load balancers to send traffic away. This patch makes it so that the decommissioning status no longer matters for the readyness determination. Draining nodes continue to declare themselves unready. Note that decommissioning nodes typically go through draining at the end of the process. The justification is that a node can be decommissioning for an arbitrary amount of time. During that time it can continue to hold leases, etc. So trying to avoid traffic is not particularly desirable. In fact, some people even want to keep a node in a decommissioning state indefinitely (by restarting a decommissioning node without recommissioning it). Also, the server.shutdown.drain_wait cluster setting is there to give load balancers ample time to find out about a draining node. Also, tactically, the code is simplified. Release note (general change): A node no longer declares itself to not be ready through the /health/ready=1 endpoint while it's in the process of decommissioning. It continues to declare itself unready while draining.

ajwerner

Reviewed 1 of 1 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto)

ajwerner

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @asubiotto)

andreimatei

bors r+

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @asubiotto)

craig · 2020-01-14T20:28:45Z

Build failed

GitHub CI (Cockroach)

andreimatei

flaked on #43957

bors r+

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @asubiotto)

43417: sql: remove restriction of foreign key on multiple outbound columns r=lucy-zhang a=otan Resolves #38850. Removing the restriction of foreign keys on multiple outbound columns. There is one minor issue we need to address which is done here -- if the constraint name already exists, append a number next to it as postgres does as well. We can make this fancier if we wish later. This PR comes with a test suite of corner cases that we suspect may be hinted at being wrong. It uses both legacy and the new optimizer settings for foreign key checks. Release note (sql change): We previously had a restriction that foreign keys can only reference one outbound column at any given point in time, e.g. in `CREATE TABLE test(a int)`, we cannot have two foreign keys on column a. This PR removes this restriction. 43889: server: declare node ready while decommissioning r=andreimatei a=andreimatei Before this patch, a decommissioning or draining, but otherwise healthy, node would return an error from the /health?ready=1 endpoint, thus declaring itself unready and signaling load balancers to send traffic away. This patch makes it so that the decommissioning status no longer matters for the readyness determination. Draining nodes continue to declare themselves unready. Note that decommissioning nodes typically go through draining at the end of the process. The justification is that a node can be decommissioning for an arbitrary amount of time. During that time it can continue to hold leases, etc. So trying to avoid traffic is not particularly desirable. In fact, some people even want to keep a node in a decommissioning state indefinitely (by restarting a decommissioning node without recommissioning it). Also, the server.shutdown.drain_wait cluster setting is there to give load balancers ample time to find out about a draining node. Also, tactically, the code is simplified. Release note (general change): A node no longer declares itself to not be ready through the /health/ready=1 endpoint while it's in the process of decommissioning. It continues to declare itself unready while draining. Co-authored-by: Oliver Tan <[email protected]> Co-authored-by: Andrei Matei <[email protected]>

craig · 2020-01-14T21:14:24Z

Build succeeded

GitHub CI (Cockroach)

andreimatei requested a review from ajwerner January 10, 2020 23:43

andreimatei requested a review from asubiotto January 14, 2020 00:34

andreimatei force-pushed the liveness.ready-on-decommissioning branch from 59f874a to 139bd21 Compare January 14, 2020 00:46

ajwerner approved these changes Jan 14, 2020

View reviewed changes

andreimatei commented Jan 14, 2020

View reviewed changes

craig bot merged commit 139bd21 into cockroachdb:master Jan 14, 2020

andreimatei deleted the liveness.ready-on-decommissioning branch January 15, 2020 01:09

jseldess mentioned this pull request Feb 19, 2020

server: declare node ready while decommissioning cockroachdb/docs#6589

Closed

andreimatei mentioned this pull request Feb 19, 2020

server,kv: IsLive() does not check the Draining flag #45123

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

server: declare node ready while decommissioning #43889

server: declare node ready while decommissioning #43889

andreimatei commented Jan 10, 2020

cockroach-teamcity commented Jan 10, 2020

andreimatei commented Jan 10, 2020

andreimatei commented Jan 14, 2020

ajwerner left a comment

ajwerner left a comment

andreimatei left a comment

craig bot commented Jan 14, 2020

andreimatei left a comment

craig bot commented Jan 14, 2020

server: declare node ready while decommissioning #43889

server: declare node ready while decommissioning #43889

Conversation

andreimatei commented Jan 10, 2020

cockroach-teamcity commented Jan 10, 2020

andreimatei commented Jan 10, 2020

andreimatei commented Jan 14, 2020

ajwerner left a comment

Choose a reason for hiding this comment

ajwerner left a comment

Choose a reason for hiding this comment

andreimatei left a comment

Choose a reason for hiding this comment

craig bot commented Jan 14, 2020

Build failed

andreimatei left a comment

Choose a reason for hiding this comment

craig bot commented Jan 14, 2020

Build succeeded