(Re)Starting a Data Node Holding a Large Number of Indices can Take Minutes #83203

original-brownbear · 2022-01-27T11:49:57Z

Restarting a data node holding lots of indices takes an unexpected amount of time.
For a node holding ~10k Beats mapping indices, it takes ~10 minutes to re-start the node.

Effectively all the time is spent on validating mappings on a single hot thread:

We should find a way to speed this up, adding a 10 minute window to restarting a large data node makes operations complicated.

elasticmachine · 2022-01-27T11:50:00Z

Pinging @elastic/es-distributed (Team:Distributed)

DaveCTurner · 2022-01-27T11:57:20Z

I think we could skip this if it's not an upgrade? Absent awful bugs, this should be a no-op if the starting node is the same version as the one that last wrote the metadata, and we include the writing node's version in the commit metadata so could plumb it through via PersistedClusterStateService#OnDiskState.

DaveCTurner · 2022-08-18T09:36:05Z

Given #84148, we don't really think validating these mappings at startup is useful anyway and therefore we could just skip this validation always. Or do nothing and wait for #84148 to be addressed.

original-brownbear added >bug :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. labels Jan 27, 2022

elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Jan 27, 2022

original-brownbear mentioned this issue Jan 27, 2022

Fix Large Shard Count Scalability Issues #77466

Open

97 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(Re)Starting a Data Node Holding a Large Number of Indices can Take Minutes #83203

(Re)Starting a Data Node Holding a Large Number of Indices can Take Minutes #83203

original-brownbear commented Jan 27, 2022

elasticmachine commented Jan 27, 2022

DaveCTurner commented Jan 27, 2022

DaveCTurner commented Aug 18, 2022

(Re)Starting a Data Node Holding a Large Number of Indices can Take Minutes #83203

(Re)Starting a Data Node Holding a Large Number of Indices can Take Minutes #83203

Comments

original-brownbear commented Jan 27, 2022

elasticmachine commented Jan 27, 2022

DaveCTurner commented Jan 27, 2022

DaveCTurner commented Aug 18, 2022