-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
roachtest: acceptance/gossip/peerings failed #48005
Comments
Looks like we're trying to hit the admin UI ( |
One thing we see in the logs from this node is that:
fires a few times after:
|
Yes, that sounds reasonable. The settings are not persisted on the node, so we're always running with the default settings for a little bit of time even after signaling readiness. Something we could do here is to wait until a system config has been ingested before returning from |
(roachtest).acceptance/gossip/peerings failed on master@61f18db7dd9a054d9a4648f67546202f760b5000:
More
Artifacts: /acceptance/gossip/peerings See this test on roachdash |
omg that explains so much about many other test failures I've investigated in the past. It also explains why certain SQL clients which should get some defaults initialized by settings don't get them when they connect immediately after a node starts. I'm going to file this as a separate issue under the "rolling restarts" project. |
(roachtest).acceptance/gossip/peerings failed on master@20916a30cf9356683c973f8653e8b69613a75fe4:
More
Artifacts: /acceptance/gossip/peerings See this test on roachdash |
We cannot indiscriminately block on receiving the settings before signaling ready because of the chicken-and-egg problem that comes up when the node currently starting is needed for quorum on the system config range. For example, if a three node cluster is completely down, at least two of the nodes must be online before current settings are received. I think we should do two things here:
|
In an ideal world a KV node would not declare itself as ready until it has received the current cluster settings. However, we cannot indiscriminately block on that because of the chicken-and-egg problem that comes up when the node currently starting is needed for quorum on the system config range. For example, if a three node cluster is completely down, at least two of the nodes must be online before current settings are received. Instead do the following: 1. persist the settings locally on the first store whenever they are received, so that restarting nodes can come up with settings that are no staler than the ones they had when they went down. 2. if a new node joins the cluster, we can wait for the settings to show up (since the node is just joining, it is not required for any quorum). Fixes cockroachdb#48005. Release note: None
Add functions to persist settings key values with the local store prefix so restarting nodes can come up with settings that are no staler than the ones they had when they went down. Fixes cockroachdb#48005. Release note: None Signed-off-by: Vaibhav <[email protected]>
In an ideal world a KV node would not declare itself as ready until it has received the current cluster settings. However, we cannot indiscriminately block on that because of the chicken-and-egg problem that comes up when the node currently starting is needed for quorum on the system config range. For example, if a three node cluster is completely down, at least two of the nodes must be online before current settings are received. Instead do the following: 1. persist the settings locally on the first store whenever they are received, so that restarting nodes can come up with settings that are no staler than the ones they had when they went down. 2. if a new node joins the cluster, we can wait for the settings to show up (since the node is just joining, it is not required for any quorum). Fixes cockroachdb#48005. Release note: None Signed-off-by: Vaibhav <[email protected]>
In an ideal world a KV node would not declare itself as ready until it has received the current cluster settings. However, we cannot indiscriminately block on that because of the chicken-and-egg problem that comes up when the node currently starting is needed for quorum on the system config range. For example, if a three node cluster is completely down, at least two of the nodes must be online before current settings are received. Instead do the following: 1. persist the settings locally on the first store whenever they are received, so that restarting nodes can come up with settings that are no staler than the ones they had when they went down. 2. if a new node joins the cluster, we can wait for the settings to show up (since the node is just joining, it is not required for any quorum). Fixes cockroachdb#48005. Release note: None Signed-off-by: Vaibhav <[email protected]>
In an ideal world a KV node would not declare itself as ready until it has received the current cluster settings. However, we cannot indiscriminately block on that because of the chicken-and-egg problem that comes up when the node currently starting is needed for quorum on the system config range. For example, if a three node cluster is completely down, at least two of the nodes must be online before current settings are received. Instead do the following: 1. persist the settings locally on the first store whenever they are received, so that restarting nodes can come up with settings that are no staler than the ones they had when they went down. 2. if a new node joins the cluster, we can wait for the settings to show up (since the node is just joining, it is not required for any quorum). Fixes cockroachdb#48005. Release note: None Signed-off-by: Vaibhav <[email protected]>
In an ideal world a KV node would not declare itself as ready until it has received the current cluster settings. However, we cannot indiscriminately block on that because of the chicken-and-egg problem that comes up when the node currently starting is needed for quorum on the system config range. For example, if a three node cluster is completely down, at least two of the nodes must be online before current settings are received. Instead do the following: 1. persist the settings locally on the first store whenever they are received, so that restarting nodes can come up with settings that are no staler than the ones they had when they went down. 2. if a new node joins the cluster, we can wait for the settings to show up (since the node is just joining, it is not required for any quorum). Fixes cockroachdb#48005. Release note: None Signed-off-by: Vaibhav <[email protected]>
In an ideal world a KV node would not declare itself as ready until it has received the current cluster settings. However, we cannot indiscriminately block on that because of the chicken-and-egg problem that comes up when the node currently starting is needed for quorum on the system config range. For example, if a three node cluster is completely down, at least two of the nodes must be online before current settings are received. Instead do the following: 1. persist the settings locally on the first store whenever they are received, so that restarting nodes can come up with settings that are no staler than the ones they had when they went down. 2. if a new node joins the cluster, we can wait for the settings to show up (since the node is just joining, it is not required for any quorum). Fixes cockroachdb#48005. Release note: None Signed-off-by: Vaibhav <[email protected]>
55166: server: ensure settings are up-to-date. r=tbg a=vrongmeal [WIP] Context: #50271 Fixes #48005. Release note: None Signed-off-by: Vaibhav <[email protected]> Co-authored-by: Vaibhav <[email protected]>
(roachtest).acceptance/gossip/peerings failed on master@0e16cc15f139b816b8e46fe6571691a8ec0e6937:
More
Artifacts: /acceptance/gossip/peerings
See this test on roachdash
powered by pkg/cmd/internal/issues
The text was updated successfully, but these errors were encountered: