Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding a deprecation info API check for too many shards #85967

Merged

Conversation

masseyke
Copy link
Member

This commit makes sure that there is enough room in a cluster to add a small number of shards during an upgrade. The
information is exposed in the deprecation info API as a cluster configuration check.
Closes #85702

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

@dakrone dakrone self-requested a review April 28, 2022 15:44
@joegallo
Copy link
Contributor

joegallo commented Apr 28, 2022

Do we actually need room for the replicas? That is, does something in the process truly require these indices to be green or is (merely) yellow sufficient?

As an edge case, I'm not sure your API is correctly shaped. Imagine a three data node cluster where two of the three nodes are already at cluster.max_shards_per_node and the last node is 20 shards under that. I think the math you're doing internally would indicate that "yes, I can totally allocate 5 new shards + 1 replica each", but that isn't actually so. I think checkShardLimit is closer to the right API already.

edit: adding a little additional commentary: personally, I think it's fine if the system gets the above case wrong, I'd just rather that the wrong logic lives in ClusterDeprecationChecks (maybe with a "this is a bit of a hack, but c'est la vie"-type comment) rather than ShardLimitValidator -- the latter seems so closer to the core of the system, and so I think it's more important that it is technically correct.

@masseyke
Copy link
Member Author

Yeah my initial thought was to just check that we're under the limit. But it sounds like the problem is actually that during an upgrade kibana creates more indices. See #85702 (comment) for a little more. So you need a little more room. My thinking (and maybe it's not great) was that on a tiny cluster like you describe, you wouldn't be on cloud and you wouldn't let something like the deprecation API stop you from upgrading. But I don't know how we handle both supporting upgrades on cloud and tiny self-managed clusters, unless we kick this check over to cloud to do.

Copy link
Member

@dakrone dakrone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I don't know how we handle both supporting upgrades on cloud and tiny self-managed clusters, unless we kick this check over to cloud to do.

Since this is a dynamic setting, this is not the end of the world for supporting. For example, a tiny self-managed cluster could probably bump the limit up prior to their upgrade.

Comment on lines 22 to 24
"The cluster has too many shards to be able to upgrade",
"https://ela.st/es-deprecation-7-shard-limit",
"Delete indices to clear up space",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this message from another deprecation error? Otherwise maybe we can be more specific and say something like "there is not enough room for N more shards, increase the cluster.max_shards_per_node or cluster.max_shards_per_node.frozen settings or remove indices to clear up resources."

(using "space" is hard here, because it makes it sound like it's disk-related, when really it's just our arbitrary limits)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be used in another error message, but in this case I think it was just me trying to walk the line between being too technical and not technical enough to be useful. I'll reword it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I intentionally didn't include cluster.max_shards_per_node.frozen in the reworded version because we're not checking that setting (the assumption is that the new small index would not be a frozen index).

@masseyke masseyke requested a review from dakrone May 3, 2022 22:59
@masseyke
Copy link
Member Author

masseyke commented May 4, 2022

@elasticmachine update branch

Copy link
Member

@dakrone dakrone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, sorry I took so long for review on this.

@masseyke masseyke merged commit 192b4ca into elastic:master May 6, 2022
@masseyke masseyke deleted the feature/deprecation-info-shard-check branch May 6, 2022 13:03
masseyke added a commit to masseyke/elasticsearch that referenced this pull request May 6, 2022
This commit makes sure that there is enough room in a cluster to add a small number of shards during an upgrade. The information is exposed in the deprecation info API as a cluster configuration check.
Closes elastic#85702
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
7.17 Commit could not be cherrypicked due to conflicts
8.2

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 85967

elasticsearchmachine pushed a commit that referenced this pull request May 6, 2022
#86518)

* Adding a deprecation info API check for too many shards (#85967)

This commit makes sure that there is enough room in a cluster to add a small number of shards during an upgrade. The information is exposed in the deprecation info API as a cluster configuration check.
Closes #85702

* fixing unit test
elasticsearchmachine pushed a commit that referenced this pull request May 6, 2022
ShardLimitValidatorTests was accidentally broken as part of #85967
because the new assertions in that class were not taking the type of
nodes into account ("normal" or "frozen"). This is a simple change to
take that into account.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Emit critical deprecation warning if maximum open shards exceeded
5 participants