Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[wip] server: ensure settings are up-to-date #50271

Closed
wants to merge 1 commit into from

Conversation

tbg
Copy link
Member

@tbg tbg commented Jun 16, 2020

In an ideal world a KV node would not declare itself as ready until
it has received the current cluster settings.

However, we cannot indiscriminately block on that because of the
chicken-and-egg problem that comes up when the node currently starting
is needed for quorum on the system config range. For example, if a three
node cluster is completely down, at least two of the nodes must be
online before current settings are received.

Instead do the following:

  1. persist the settings locally on the first store whenever they are
    received, so that restarting nodes can come up with settings that are no
    staler than the ones they had when they went down.
  2. if a new node joins the cluster, we can wait for the settings to show
    up (since the node is just joining, it is not required for any quorum).

Fixes #48005.

Release note: None

In an ideal world a KV node would not declare itself as ready until
it has received the current cluster settings.

However, we cannot indiscriminately block on that because of the
chicken-and-egg problem that comes up when the node currently starting
is needed for quorum on the system config range. For example, if a three
node cluster is completely down, at least two of the nodes must be
online before current settings are received.

Instead do the following:

1. persist the settings locally on the first store whenever they are
received, so that restarting nodes can come up with settings that are no
staler than the ones they had when they went down.
2. if a new node joins the cluster, we can wait for the settings to show
up (since the node is just joining, it is not required for any quorum).

Fixes cockroachdb#48005.

Release note: None
@tbg tbg requested a review from ajwerner June 16, 2020 13:26
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also write the settings values to the store during bootstrap?

Reviewed 3 of 3 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @tbg)


pkg/server/init.go, line 238 at r1 (raw file):

		var initialSettingsKVs []roachpb.KeyValue
		select {
		case <-time.After(5 * time.Second):

Should we log in this case?

@nvanbenschoten
Copy link
Member

I like the approach proposed here. The guarantee that a node's settings will never regress across restarts is a big improvement.

tbg added a commit to tbg/cockroach that referenced this pull request Aug 27, 2020
Pulling the gossip info via HTTP can fail if the freshly restarted nodes
didn't gossip the actual cluster settings yet. Fixing the initialization
is tracked via cockroachdb#50271.

Closes cockroachdb#48005.

Release justification: testing
Release note: None
craig bot pushed a commit that referenced this pull request Nov 18, 2020
55166: server: ensure settings are up-to-date. r=tbg a=vrongmeal

[WIP]

Context: #50271

Fixes #48005.

Release note: None

Signed-off-by: Vaibhav <[email protected]>

Co-authored-by: Vaibhav <[email protected]>
@tbg
Copy link
Member Author

tbg commented Dec 2, 2020

@vrongmeal took care of this! #55166

@tbg tbg closed this Dec 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

roachtest: acceptance/gossip/peerings failed
4 participants