roachtest: acceptance/gossip/peerings failed #48005

cockroach-teamcity · 2020-04-24T07:42:41Z

(roachtest).acceptance/gossip/peerings failed on master@0e16cc15f139b816b8e46fe6571691a8ec0e6937:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/acceptance/gossip/peerings/run_1
	gossip.go:259,acceptance.go:91,test_runner.go:753: status: 403 Forbidden, content-type: application/json, body: {
		  "error": "not allowed (due to the 'server.remote_debugging.mode' setting)",
		  "message": "not allowed (due to the 'server.remote_debugging.mode' setting)",
		  "code": 7,
		  "details": [
		  ]
		}, error: <nil>
		github.com/cockroachdb/cockroach/pkg/util/httputil.doJSONRequest
			/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/util/httputil/http.go:116
		github.com/cockroachdb/cockroach/pkg/util/httputil.GetJSON
			/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/util/httputil/http.go:55
		main.(*gossipUtil).check.func1
			/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/gossip.go:157
		github.com/cockroachdb/cockroach/pkg/util/retry.ForDuration
			/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/util/retry/retry.go:188
		main.(*gossipUtil).check
			/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/gossip.go:153
		main.runGossipPeerings
			/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/gossip.go:258
		main.registerAcceptance.func2
			/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/acceptance.go:91
		main.(*testRunner).runTest.func2
			/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:753
		runtime.goexit
			/usr/local/go/src/runtime/asm_amd64.s:1357
		failed to get gossip status from node 1
		main.(*gossipUtil).check.func1
			/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/gossip.go:158
		github.com/cockroachdb/cockroach/pkg/util/retry.ForDuration
			/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/util/retry/retry.go:188
		main.(*gossipUtil).check
			/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/gossip.go:153
		main.runGossipPeerings
			/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/gossip.go:258
		main.registerAcceptance.func2
			/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/acceptance.go:91
		main.(*testRunner).runTest.func2
			/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:753
		runtime.goexit
			/usr/local/go/src/runtime/asm_amd64.s:1357

More

Artifacts: /acceptance/gossip/peerings

See this test on roachdash
_{powered by pkg/cmd/internal/issues}

The text was updated successfully, but these errors were encountered:

nvanbenschoten · 2020-04-28T15:58:01Z

07:42:13 test.go:325: test failure: 	gossip.go:259,acceptance.go:91,test_runner.go:753: status: 403 Forbidden, content-type: application/json, body: {
		  "error": "not allowed (due to the 'server.remote_debugging.mode' setting)",
		  "message": "not allowed (due to the 'server.remote_debugging.mode' setting)",
		  "code": 7,
		  "details": [
		  ]
		}, error: <nil>
		github.com/cockroachdb/cockroach/pkg/util/httputil.doJSONRequest
			/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/util/httputil/http.go:116
		github.com/cockroachdb/cockroach/pkg/util/httputil.GetJSON
			/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/util/httputil/http.go:55
		main.(*gossipUtil).check.func1
			/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/gossip.go:157
		github.com/cockroachdb/cockroach/pkg/util/retry.ForDuration
			/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/util/retry/retry.go:188
		main.(*gossipUtil).check
			/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/gossip.go:153
		main.runGossipPeerings
			/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/gossip.go:258
		main.registerAcceptance.func2
			/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/acceptance.go:91
		main.(*testRunner).runTest.func2
			/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:753
		runtime.goexit
			/usr/local/go/src/runtime/asm_amd64.s:1357

Looks like we're trying to hit the admin UI (/_status/gossip/local) shortly after the cluster starts up and we hit this error. This is an indication that the server has not received an updated version of the cluster settings, otherwise is would be aware that server.remote_debugging.mode was set to 'any' here.

nvanbenschoten · 2020-04-28T17:09:21Z

One thing we see in the logs from this node is that:

W200424 07:42:12.970613 143 server/node.go:670  [n1] [n1,s1]: unable to compute metrics: [n1,s1]: system config not yet available

fires a few times after:

I200424 07:41:42.969732 24 server/server.go:1419  [n1] starting http server at [::]:26258 (use: 10.128.0.76:26258)
I200424 07:41:49.029579 99 gossip/gossip.go:1538  [n1] node has connected to cluster via gossip

tbg · 2020-04-29T09:12:33Z

Yes, that sounds reasonable. The settings are not persisted on the node, so we're always running with the default settings for a little bit of time even after signaling readiness.

Something we could do here is to wait until a system config has been ingested before returning from .Start (in the non-init case). But we're also phasing out this use of Gossip, so this will rot quickly, and the new system will use higher-level primitives. For now, seems best to add a retry or sleep to the test.

cockroach-teamcity · 2020-05-01T07:31:10Z

(roachtest).acceptance/gossip/peerings failed on master@61f18db7dd9a054d9a4648f67546202f760b5000:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/acceptance/gossip/peerings/run_1
	gossip.go:259,acceptance.go:91,test_runner.go:753: status: 403 Forbidden, content-type: application/json, body: {
		  "error": "not allowed (due to the 'server.remote_debugging.mode' setting)",
		  "message": "not allowed (due to the 'server.remote_debugging.mode' setting)",
		  "code": 7,
		  "details": [
		  ]
		}, error: <nil>
		github.com/cockroachdb/cockroach/pkg/util/httputil.doJSONRequest
			/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/util/httputil/http.go:116
		github.com/cockroachdb/cockroach/pkg/util/httputil.GetJSON
			/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/util/httputil/http.go:55
		main.(*gossipUtil).check.func1
			/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/gossip.go:157
		github.com/cockroachdb/cockroach/pkg/util/retry.ForDuration
			/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/util/retry/retry.go:188
		main.(*gossipUtil).check
			/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/gossip.go:153
		main.runGossipPeerings
			/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/gossip.go:258
		main.registerAcceptance.func2
			/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/acceptance.go:91
		main.(*testRunner).runTest.func2
			/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:753
		runtime.goexit
			/usr/local/go/src/runtime/asm_amd64.s:1357
		failed to get gossip status from node 1
		main.(*gossipUtil).check.func1
			/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/gossip.go:158
		github.com/cockroachdb/cockroach/pkg/util/retry.ForDuration
			/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/util/retry/retry.go:188
		main.(*gossipUtil).check
			/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/gossip.go:153
		main.runGossipPeerings
			/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/gossip.go:258
		main.registerAcceptance.func2
			/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/acceptance.go:91
		main.(*testRunner).runTest.func2
			/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:753
		runtime.goexit
			/usr/local/go/src/runtime/asm_amd64.s:1357

More

Artifacts: /acceptance/gossip/peerings

See this test on roachdash
_{powered by pkg/cmd/internal/issues}

knz · 2020-05-01T12:04:29Z

Yes, that sounds reasonable. The settings are not persisted on the node, so we're always running with the default settings for a little bit of time even after signaling readiness.

omg that explains so much about many other test failures I've investigated in the past. It also explains why certain SQL clients which should get some defaults initialized by settings don't get them when they connect immediately after a node starts.

I'm going to file this as a separate issue under the "rolling restarts" project.

cockroach-teamcity · 2020-05-07T07:26:56Z

(roachtest).acceptance/gossip/peerings failed on master@20916a30cf9356683c973f8653e8b69613a75fe4:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/acceptance/gossip/peerings/run_1
	gossip.go:259,acceptance.go:94,test_runner.go:753: status: 403 Forbidden, content-type: application/json, body: {
		  "error": "not allowed (due to the 'server.remote_debugging.mode' setting)",
		  "message": "not allowed (due to the 'server.remote_debugging.mode' setting)",
		  "code": 7,
		  "details": [
		  ]
		}, error: <nil>
		github.com/cockroachdb/cockroach/pkg/util/httputil.doJSONRequest
			/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/util/httputil/http.go:116
		github.com/cockroachdb/cockroach/pkg/util/httputil.GetJSON
			/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/util/httputil/http.go:55
		main.(*gossipUtil).check.func1
			/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/gossip.go:157
		github.com/cockroachdb/cockroach/pkg/util/retry.ForDuration
			/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/util/retry/retry.go:188
		main.(*gossipUtil).check
			/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/gossip.go:153
		main.runGossipPeerings
			/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/gossip.go:258
		main.registerAcceptance.func2
			/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/acceptance.go:94
		main.(*testRunner).runTest.func2
			/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:753
		runtime.goexit
			/usr/local/go/src/runtime/asm_amd64.s:1357
		failed to get gossip status from node 1
		main.(*gossipUtil).check.func1
			/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/gossip.go:158
		github.com/cockroachdb/cockroach/pkg/util/retry.ForDuration
			/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/util/retry/retry.go:188
		main.(*gossipUtil).check
			/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/gossip.go:153
		main.runGossipPeerings
			/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/gossip.go:258
		main.registerAcceptance.func2
			/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/acceptance.go:94
		main.(*testRunner).runTest.func2
			/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:753
		runtime.goexit
			/usr/local/go/src/runtime/asm_amd64.s:1357

More

Artifacts: /acceptance/gossip/peerings

See this test on roachdash
_{powered by pkg/cmd/internal/issues}

tbg · 2020-06-16T13:14:03Z

We cannot indiscriminately block on receiving the settings before signaling ready because of the chicken-and-egg problem that comes up when the node currently starting is needed for quorum on the system config range. For example, if a three node cluster is completely down, at least two of the nodes must be online before current settings are received.

I think we should do two things here:

persist the settings locally on the first store, so that restarting nodes can come up with settings that are no staler than the ones they had when they went down.
if a new node joins the cluster, wait until it has gotten the settings from gossip (and applied them) before continuing to ready status.

In an ideal world a KV node would not declare itself as ready until it has received the current cluster settings. However, we cannot indiscriminately block on that because of the chicken-and-egg problem that comes up when the node currently starting is needed for quorum on the system config range. For example, if a three node cluster is completely down, at least two of the nodes must be online before current settings are received. Instead do the following: 1. persist the settings locally on the first store whenever they are received, so that restarting nodes can come up with settings that are no staler than the ones they had when they went down. 2. if a new node joins the cluster, we can wait for the settings to show up (since the node is just joining, it is not required for any quorum). Fixes cockroachdb#48005. Release note: None

Add functions to persist settings key values with the local store prefix so restarting nodes can come up with settings that are no staler than the ones they had when they went down. Fixes cockroachdb#48005. Release note: None Signed-off-by: Vaibhav <[email protected]>

In an ideal world a KV node would not declare itself as ready until it has received the current cluster settings. However, we cannot indiscriminately block on that because of the chicken-and-egg problem that comes up when the node currently starting is needed for quorum on the system config range. For example, if a three node cluster is completely down, at least two of the nodes must be online before current settings are received. Instead do the following: 1. persist the settings locally on the first store whenever they are received, so that restarting nodes can come up with settings that are no staler than the ones they had when they went down. 2. if a new node joins the cluster, we can wait for the settings to show up (since the node is just joining, it is not required for any quorum). Fixes cockroachdb#48005. Release note: None Signed-off-by: Vaibhav <[email protected]>

55166: server: ensure settings are up-to-date. r=tbg a=vrongmeal [WIP] Context: #50271 Fixes #48005. Release note: None Signed-off-by: Vaibhav <[email protected]> Co-authored-by: Vaibhav <[email protected]>

cockroach-teamcity added this to the 20.1 milestone Apr 24, 2020

cockroach-teamcity assigned andreimatei Apr 24, 2020

andreimatei removed the release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. label Apr 28, 2020

nvanbenschoten self-assigned this Apr 28, 2020

knz mentioned this issue May 1, 2020

server: don't signal readiness before settings are initialized #48273

Closed

tbg assigned tbg and unassigned andreimatei and nvanbenschoten May 7, 2020

cockroach-teamcity mentioned this issue May 13, 2020

roachtest: acceptance/gossip/peerings failed #48829

Closed

cockroach-teamcity mentioned this issue May 26, 2020

roachtest: acceptance/gossip/peerings failed #49536

Closed

tbg mentioned this issue Jun 16, 2020

[wip] server: ensure settings are up-to-date #50271

Closed

craig bot closed this as completed in 5343d56 Aug 27, 2020

vrongmeal mentioned this issue Oct 2, 2020

server: ensure settings are up-to-date. #55166

Merged

craig bot pushed a commit that referenced this issue Nov 18, 2020

Merge #55166

2eadbfd

55166: server: ensure settings are up-to-date. r=tbg a=vrongmeal [WIP] Context: #50271 Fixes #48005. Release note: None Signed-off-by: Vaibhav <[email protected]> Co-authored-by: Vaibhav <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

roachtest: acceptance/gossip/peerings failed #48005

roachtest: acceptance/gossip/peerings failed #48005

cockroach-teamcity commented Apr 24, 2020

nvanbenschoten commented Apr 28, 2020

nvanbenschoten commented Apr 28, 2020

tbg commented Apr 29, 2020

cockroach-teamcity commented May 1, 2020

knz commented May 1, 2020

cockroach-teamcity commented May 7, 2020

tbg commented Jun 16, 2020

roachtest: acceptance/gossip/peerings failed #48005

roachtest: acceptance/gossip/peerings failed #48005

Comments

cockroach-teamcity commented Apr 24, 2020

nvanbenschoten commented Apr 28, 2020

nvanbenschoten commented Apr 28, 2020

tbg commented Apr 29, 2020

cockroach-teamcity commented May 1, 2020

knz commented May 1, 2020

cockroach-teamcity commented May 7, 2020

tbg commented Jun 16, 2020