Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Upgrade Assistant] Warn if cluster's node attributes and data tiers may not match #83800

Closed
VimCommando opened this issue Feb 1, 2022 · 21 comments · Fixed by #84050
Closed
Assignees
Labels
>enhancement Feature:Upgrade Assistant Team:Data Management Meta label for data/management team Team:Deployment Management Meta label for Management Experience - Deployment Management team

Comments

@VimCommando
Copy link
Contributor

Describe the feature:

If a node has node.data: true defined and includes any node.attributes.data value, list a warning the configuration may not be valid for data tiers in 8.0.

The presence of a node.attribute.data value strongly indicates a hot/warm or tiered architecture. It is completely valid to run all data tiers on the same nodes if the cluster is not used for timeseries data.

Describe a specific use case for the feature:

Historically Elasticsearch has recommended using node.attributes.data to identify hot, warm or cold nodes.

In 7.9 we introduced data tiers: https://www.elastic.co/guide/en/elasticsearch/reference/7.16/data-tiers.html

When upgrading to 8.0 the legacy node.data is no longer allowed: #66409

In 7.10 through 7.17 any node with node.data: true is assigned all 5 data tiers: data_content, data_hot, data_warm, data_cold and data_frozen. This can conflict with the preexisting node attribute, leading to shards being assigned to unexpected nodes. This includes system indices which could end up on cold/frozen tiers and cause unexpected cluster behavior.

In 8.0 we rely on _tier_preference so it is critical these are accurate in the nodes' elasticsearch.yml files: #76147

@VimCommando VimCommando added enhancement Team:Deployment Management Meta label for Management Experience - Deployment Management team Feature:Upgrade Assistant labels Feb 1, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/platform-deployment-management (Team:Deployment Management)

@cjcenizal
Copy link
Contributor

@jakelandis Do you think we could add this warning to the Deprecation Info API?

@dakrone
Copy link
Member

dakrone commented Feb 4, 2022

Should we be making value judgements for on-prem users based on an arbitrary node attribute, given that the attribute name could be anything (using data is only our Cloud convention, after all)?

@cjcenizal
Copy link
Contributor

@dakrone It sounds like you're concerned about providing guidance based on erroneous assumptions. Is that right? The way I read Ryan's suggestion, it sounded less about making assumptions, and more about pointing out possibilities and making suggestions. Bits that stood out to me in bold:

If a node has node.data: true defined and includes any node.attributes.data value, list a warning the configuration may not be valid for data tiers in 8.0.

If we can craft a message that explains what ES has observed in the configuration and clearly explains how the user can determine whether it truly is a problem or not, then I think we can help some users and also reduce the risk of confusing others.

@alisonelizabeth
Copy link
Contributor

@dakrone / @jakelandis thoughts on CJ's latest comment?

@VimCommando
Copy link
Contributor Author

Should we be making value judgements for on-prem users based on an arbitrary node attribute, given that the attribute name could be anything (using data is only our Cloud convention, after all)?

It is not just a cloud convention, this is in numerous examples we've been advocating for since we introduced ILM:
https://www.elastic.co/blog/implementing-hot-warm-cold-in-elasticsearch-with-index-lifecycle-management

Yes it is absolutely possible a customer uses something other than data, but the most likely use-case is for a hot/warm architecture which is at high risk of issues if they haven't reviewed their cluster configuration.

@dakrone
Copy link
Member

dakrone commented Feb 9, 2022

Okay, I think this makes sense then, however, I would suggest a change:

list a warning the configuration may not be valid for data tiers in 8.0.

+1 on it being a warning, but the validity is kind of confusing, given that it is still valid, just not recommended (attribute functionality is not going to be removed any time soon). So perhaps we can list a warning that the configuration is not recommended instead?

@cjcenizal
Copy link
Contributor

OK, here's my proposed message:

Title: Your cluster might not be properly configured for data tiers

Body: One or more of your nodes is configured with the node.data: true and node.attributes.data settings. This is typically used to create a hot/warm or tiered architecture, based on legacy guidelines. Please see the docs at for more information on how to determine whether this is a problem for your deployment, and what you can do to address it.

@VimCommando Does this contain all of the information the user needs? Do we already have a docs page we can link the user to?

@dakrone Do you know if this is pertinent to Cloud users or is this something Cloud will address automatically?

@dakrone
Copy link
Member

dakrone commented Feb 9, 2022

@dakrone Do you know if this is pertinent to Cloud users or is this something Cloud will address automatically?

This should be handled automatically by Cloud, which is part of my concern about it, since if a Cloud user were to see this, there is literally nothing they can do about it (short of clicking the "migrate to data tiers" button on their deployment configuration I think), which means it can be frustrating to have a warning they can't get rid of.

For a regular warning, I think we should also do something like:

Node [xyz] has the [data_warm] node role assigned, but is using the node.attributes.data: hot attribute. This mismatch is not recommended.
(but probably nicer and less technical and linking to documentation)

Which is more what I was thinking when I read "warn if cluster's node attributes and data tiers may not match". For the node.data: true and node.attributes.data case, my preference is something a little softer like:

One or more of your nodes is configured with the node.data: true and node.attributes.data settings. This is typically used to create a hot/warm or tiered architecture, based on legacy guidelines. Data tiers are a recommended replacement for tiered architecture clusters. Please see the docs at <place> for more information about data tiers.

But that's just my preference, so maybe there is a better way!

@cjcenizal
Copy link
Contributor

Didn't y'all implement a blocklist on Cloud for settings which will be excluded from the Deprecation Info API output? I think that would take care of this case, right?

Thanks for taking a pass at the copy. I think you and Ryan are probably the best folks to work out the details that need to go into the message, and then one of the writers can help with the phrasing once you're both in agreement.

@jakelandis
Copy link
Contributor

We already have CRITICAL level warnings for node.data, node.master, etc. since they are not supported in 8.0. We could provide better guidance for data tiers in the fly out, but what is there is correct. [1]

We don't currently provide any warnings for custom node attributes...I agree that is some value in a warning for custom node.attributes.data settings but I have some reservations about a warning for a mismatch between conventions (node.attributes.data) and it's expected data tier equivalent. I think @dakrone's suggestion helps (but no need to include 'node.data: true').

One or more of your nodes is configured with node.attributes.data settings. This is typically used to create a hot/warm or tiered architecture, based on legacy guidelines. Data tiers are a recommended replacement for tiered architecture clusters. Please see the docs at <place> for more information about data tiers.

Also, i wonder we should just expand the fly out documentation for the CRITICAL warning (since people will actually read that!) with data tier reference(s). (but that only impacts non-default clusters where a user expclicitly configured node.data, node.master, etc.)

Didn't y'all implement a blocklist on Cloud for settings which will be excluded from the Deprecation Info API output? I think that would take care of this case, right?

I think that would work to hide from Cloud.

[1]
image

image

@VimCommando
Copy link
Contributor Author

One or more of your nodes is configured with the node.data: true and node.attributes.data settings. This is typically used to create a hot/warm or tiered architecture, based on legacy guidelines. Data tiers are a recommended replacement for tiered architecture clusters. Please see the docs at <place> for more information about data tiers.

This sounds like a good warning to me, as long as the documentation it links to is clear. Right now we don't even mention this issue in the 8.0 migration guide.

https://www.elastic.co/guide/en/elasticsearch/reference/8.0/migrating-8.0.html

The biggest problem with suggesting you remove node.data and add node.roles: ["data"] is this: it is exactly how you get all 5 tiers assigned without regard to the node.attribute.data value.

@alisonelizabeth
Copy link
Contributor

@dakrone / @jakelandis is it OK if I transfer this issue to Elasticsearch? If I'm understanding correctly, the proposed change will be made in the deprecation info API and there isn't any additional work needed on the UA side.

@dakrone
Copy link
Member

dakrone commented Feb 10, 2022

I think that's okay, yes the change would be on the deprecation API side.

@alisonelizabeth alisonelizabeth transferred this issue from elastic/kibana Feb 10, 2022
@alisonelizabeth alisonelizabeth added the Team:Data Management Meta label for data/management team label Feb 10, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

@masseyke
Copy link
Member

I am going to take this one, but I'm a little unclear on where we settled. Are we just updating the message for the existing deprecation we already have if node.data is set? Or are we also adding a separate warning deprecation if node.attributes.data is set?

@jakelandis
Copy link
Contributor

Are we just updating the message for the existing deprecation we already have if node.data is set?

We should probably update to make mention data tier node roles such data_hot, data_content, etc. but that is bit outside the scope of the immediate ask.

Or are we also adding a separate warning deprecation if node.attributes.data is set?

yes. More technically, if node.attr.data is set. You can view the node attributes via GET _nodes?filter_path=nodes.*.attributes [1] or _cat/nodeattrs [2] and they are set via elasticsearch.yml or an environment variable (see here). node.attr.data is really just a common convention you achieve the effect with node.attr.foo (see here) so it is not perfect and is softly worded warning to steer people to data tiers. Since this is a deployment level concern users on Cloud don't have the ability to change this (and it when upgrading Cloud will do the right thing) so we don't want to show this message to users on Cloud.

[1]

{
  "nodes" : {
    "ckqKId2fRfy3CsZFME689w" : {
      "attributes" : {
        "logical_availability_zone" : "zone-0",
        "server_name" : "instance-0000000001.b074bd77617f4ad48be1676cdc1e58ea",
        "availability_zone" : "us-east4-a",
        "xpack.installed" : "true",
        "data" : "warm",
        "instance_configuration" : "gcp.es.datawarm.n2.68x10x190",
        "transform.node" : "false",
        "region" : "unknown-region"
      }
    },
    "ajMYjuF-TGCF0AqMNfQNWA" : {
      "attributes" : {
        "logical_availability_zone" : "zone-0",
        "server_name" : "instance-0000000000.b074bd77617f4ad48be1676cdc1e58ea",
        "availability_zone" : "us-east4-a",
        "xpack.installed" : "true",
        "data" : "hot",
        "instance_configuration" : "gcp.es.datahot.n2.68x10x45",
        "transform.node" : "true",
        "region" : "unknown-region"
      }
    },
    "5FwW_aurQ6S6lbXa7nI5Wg" : {
      "attributes" : {
        "logical_availability_zone" : "zone-0",
        "server_name" : "instance-0000000002.b074bd77617f4ad48be1676cdc1e58ea",
        "availability_zone" : "us-east4-a",
        "xpack.installed" : "true",
        "data" : "cold",
        "instance_configuration" : "gcp.es.datacold.n2.68x10x190",
        "transform.node" : "false",
        "region" : "unknown-region"
      }
    }
  }
}

[2]

instance-0000000002 10.46.66.237 10.46.66.237 logical_availability_zone zone-0
instance-0000000002 10.46.66.237 10.46.66.237 server_name               instance-0000000002.b074bd77617f4ad48be1676cdc1e58ea
instance-0000000002 10.46.66.237 10.46.66.237 availability_zone         us-east4-a
instance-0000000002 10.46.66.237 10.46.66.237 xpack.installed           true
instance-0000000002 10.46.66.237 10.46.66.237 data                      cold
instance-0000000002 10.46.66.237 10.46.66.237 instance_configuration    gcp.es.datacold.n2.68x10x190
instance-0000000002 10.46.66.237 10.46.66.237 transform.node            false
instance-0000000002 10.46.66.237 10.46.66.237 region                    unknown-region
instance-0000000000 10.46.65.12  10.46.65.12  logical_availability_zone zone-0
instance-0000000000 10.46.65.12  10.46.65.12  server_name               instance-0000000000.b074bd77617f4ad48be1676cdc1e58ea
instance-0000000000 10.46.65.12  10.46.65.12  availability_zone         us-east4-a
instance-0000000000 10.46.65.12  10.46.65.12  xpack.installed           true
instance-0000000000 10.46.65.12  10.46.65.12  data                      hot
instance-0000000000 10.46.65.12  10.46.65.12  instance_configuration    gcp.es.datahot.n2.68x10x45
instance-0000000000 10.46.65.12  10.46.65.12  transform.node            true
instance-0000000000 10.46.65.12  10.46.65.12  region                    unknown-region
instance-0000000001 10.46.65.27  10.46.65.27  logical_availability_zone zone-0
instance-0000000001 10.46.65.27  10.46.65.27  server_name               instance-0000000001.b074bd77617f4ad48be1676cdc1e58ea
instance-0000000001 10.46.65.27  10.46.65.27  availability_zone         us-east4-a
instance-0000000001 10.46.65.27  10.46.65.27  xpack.installed           true
instance-0000000001 10.46.65.27  10.46.65.27  data                      warm
instance-0000000001 10.46.65.27  10.46.65.27  instance_configuration    gcp.es.datawarm.n2.68x10x190
instance-0000000001 10.46.65.27  10.46.65.27  transform.node            false
instance-0000000001 10.46.65.27  10.46.65.27  region                    unknown-region

@masseyke
Copy link
Member

We ought to have this on the 8.x line as well right?

@jakelandis
Copy link
Contributor

We ought to have this on the 8.x line as well right?

yeah, that probably makes sense to include in 8.x

masseyke added a commit that referenced this issue Feb 16, 2022
This adds a warning-level deprecation if a user has set the node.attr.data setting, since it is a sign that they are
trying to create a hot/warm setup in the way that is no longer supported.
Closes #83800
masseyke added a commit to masseyke/elasticsearch that referenced this issue Feb 16, 2022
This adds a warning-level deprecation if a user has set the node.attr.data setting, since it is a sign that they are
trying to create a hot/warm setup in the way that is no longer supported.
Closes elastic#83800
masseyke added a commit to masseyke/elasticsearch that referenced this issue Feb 16, 2022
This adds a warning-level deprecation if a user has set the node.attr.data setting, since it is a sign that they are
trying to create a hot/warm setup in the way that is no longer supported.
Closes elastic#83800
elasticsearchmachine pushed a commit that referenced this issue Feb 16, 2022
This adds a warning-level deprecation if a user has set the node.attr.data setting, since it is a sign that they are
trying to create a hot/warm setup in the way that is no longer supported.
Closes #83800
elasticsearchmachine pushed a commit that referenced this issue Feb 16, 2022
This adds a warning-level deprecation if a user has set the node.attr.data setting, since it is a sign that they are
trying to create a hot/warm setup in the way that is no longer supported.
Closes #83800
probakowski pushed a commit to probakowski/elasticsearch that referenced this issue Feb 23, 2022
This adds a warning-level deprecation if a user has set the node.attr.data setting, since it is a sign that they are
trying to create a hot/warm setup in the way that is no longer supported.
Closes elastic#83800
@jeacott1
Copy link

shouldn't the docs perhaps be updated here: https://www.elastic.co/guide/en/cloud-on-k8s/current/k8s-advanced-node-scheduling.html#k8s-hot-warm-topologies
it still uses "node.attr.data: hot" etc in the 2.5 version docs which is quite confusing.

@abdonpijpelink
Copy link
Contributor

Thanks for reporting @jeacott1 ! I've opened a new issue to fix the documentation here: elastic/cloud-on-k8s#6196

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement Feature:Upgrade Assistant Team:Data Management Meta label for data/management team Team:Deployment Management Meta label for Management Experience - Deployment Management team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants