Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(Re)Starting a Data Node Holding a Large Number of Indices can Take Minutes #83203

Open
Tracked by #77466
original-brownbear opened this issue Jan 27, 2022 · 3 comments
Labels
>bug :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.

Comments

@original-brownbear
Copy link
Member

Restarting a data node holding lots of indices takes an unexpected amount of time.
For a node holding ~10k Beats mapping indices, it takes ~10 minutes to re-start the node.

Effectively all the time is spent on validating mappings on a single hot thread:

image

We should find a way to speed this up, adding a 10 minute window to restarting a large data node makes operations complicated.

@original-brownbear original-brownbear added >bug :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. labels Jan 27, 2022
@elasticmachine elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Jan 27, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@DaveCTurner
Copy link
Contributor

I think we could skip this if it's not an upgrade? Absent awful bugs, this should be a no-op if the starting node is the same version as the one that last wrote the metadata, and we include the writing node's version in the commit metadata so could plumb it through via PersistedClusterStateService#OnDiskState.

@DaveCTurner
Copy link
Contributor

Given #84148, we don't really think validating these mappings at startup is useful anyway and therefore we could just skip this validation always. Or do nothing and wait for #84148 to be addressed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.
Projects
None yet
Development

No branches or pull requests

3 participants