Fix Large Shard Count Scalability Issues #77466

original-brownbear · 2021-09-09T05:44:25Z

This meta issue tracks known issues with scaling clusters to large numbers of shards.

The text was updated successfully, but these errors were encountered:

elasticmachine · 2021-09-09T05:44:28Z

Pinging @elastic/es-distributed (Team:Distributed)

elasticmachine · 2021-09-09T05:44:28Z

Pinging @elastic/es-security (Team:Security)

elasticmachine · 2021-09-09T05:44:28Z

Pinging @elastic/es-data-management (Team:Data Management)

This change improves the parsing of LifecycleExecutionState from IndexMetadata custom data by avoiding containsKey(...) call and in case there is no custom data then return a blank LifecycleExecutionState instance. Relates to elastic#77466

This change improves the parsing of LifecycleExecutionState from IndexMetadata custom data by avoiding containsKey(...) call and in case there is no custom data then return a blank LifecycleExecutionState instance. Relates to #77466

This change improves the parsing of LifecycleExecutionState from IndexMetadata custom data by avoiding containsKey(...) call and in case there is no custom data then return a blank LifecycleExecutionState instance. Relates to elastic#77466

…hen running a policy. Sometimes the parsing done by `getCurrentStep()` method is unnecessary, because the method calling the `getCurrentStep()` method has already parsed a `LifecycleExecutionState` instance and can just provide that. Relates to elastic#77466

This change improves the parsing of LifecycleExecutionState from IndexMetadata custom data by avoiding containsKey(...) call and in case there is no custom data then return a blank LifecycleExecutionState instance. Relates to #77466

…hen running a policy. (#77863) Sometimes the parsing done by `getCurrentStep()` method is unnecessary, because the method calling the `getCurrentStep()` method has already parsed a `LifecycleExecutionState` instance and can just provide that. Relates to #77466

…hen running a policy. (elastic#77863) Sometimes the parsing done by `getCurrentStep()` method is unnecessary, because the method calling the `getCurrentStep()` method has already parsed a `LifecycleExecutionState` instance and can just provide that. Relates to elastic#77466

…hen running a policy. (#77863) (#77882) Sometimes the parsing done by `getCurrentStep()` method is unnecessary, because the method calling the `getCurrentStep()` method has already parsed a `LifecycleExecutionState` instance and can just provide that. Relates to #77466

Prevent duplicate ILM tasks from being enqueued to fix the most immediate issues around #78246. The ILM logic should be further improved though. I did not include `MoveToErrorStepUpdateTask` in this change yet as I wasn't entirely sure how valid/safe hashing/comparing arbitrary `Exception`s would be. That could be looked into in a follow-up as well. Relates #77466 Closes #78246

…ic#78390) Prevent duplicate ILM tasks from being enqueued to fix the most immediate issues around elastic#78246. The ILM logic should be further improved though. I did not include `MoveToErrorStepUpdateTask` in this change yet as I wasn't entirely sure how valid/safe hashing/comparing arbitrary `Exception`s would be. That could be looked into in a follow-up as well. Relates elastic#77466 Closes elastic#78246

…outing (#90556) Saw these strings use up tens of MB on a large cluster's master node during snapshotting and responding to indices stats requests. Also, this speeds up node id comparisons in the snapshot shards service and snapshots allocation decider. relates #77466

This can take O(10s) for tens of thousands of shards, we have to fork it. relates #77466

…ic#90651) This can take O(10s) for tens of thousands of shards, we have to fork it. relates elastic#77466

… (#90664) This can take O(10s) for tens of thousands of shards, we have to fork it. relates #77466

BroadcastReplicationAction derivatives (`POST /<indices>/_refresh` and `POST /<indices>/_flush`) are pretty inefficient when targeting high shard counts due to how `TransportBroadcastReplicationAction` works: - It computes the list of all target shards up-front on the calling (transport) thread. - It eagerly sends one request for every target shard in a tight loop on the calling (transport) thread. - It accumulates responses in a `CopyOnWriteArrayList` which takes quadratic work to populate, even though nothing reads this list until it's fully populated. - It then mostly discards the accumulated responses, keeping only the total number of shards, the number of successful shards, and a list of any failures. - Each failure is wrapped up in a `ReplicationResponse.ShardInfo.Failure` but then unwrapped at the end to be re-wrapped in a `DefaultShardOperationFailedException`. This commit fixes all this: - It avoids allocating a list of all target shards, instead iterating over the target indices and generating shard IDs on the fly. - The computation of the list of shards, and the sending of the per-shard requests, now happens on the relevant threadpool (`REFRESH` or `FLUSH`) rather than a transport thread. - The per-shard requests are now throttled, with a meaningful yet fairly generous concurrency limit of `#(data nodes) * 10`. - Rather than accumulating the full responses for later processing we track the counts and failures directly. - The failures are tracked in a regular `ArrayList`, avoiding the accidentally-quadratic complexity. - The failures are tracked in their final form, skipping the unwrap-and-rewrap step at the end. Relates elastic#77466

BroadcastReplicationAction derivatives (`POST /<indices>/_refresh` and `POST /<indices>/_flush`) are pretty inefficient when targeting high shard counts due to how `TransportBroadcastReplicationAction` works: - It computes the list of all target shards up-front on the calling (transport) thread. - It accumulates responses in a `CopyOnWriteArrayList` which takes quadratic work to populate, even though nothing reads this list until it's fully populated. - It then mostly discards the accumulated responses, keeping only the total number of shards, the number of successful shards, and a list of any failures. - Each failure is wrapped up in a `ReplicationResponse.ShardInfo.Failure` but then unwrapped at the end to be re-wrapped in a `DefaultShardOperationFailedException`. This commit fixes all this: - The computation of the list of shards, and the sending of the per-shard requests, now happens on the relevant threadpool (`REFRESH` or `FLUSH`) rather than a transport thread. - The failures are tracked in a regular `ArrayList`, avoiding the accidentally-quadratic complexity. - Rather than accumulating the full responses for later processing we track the counts and failures directly. - The failures are tracked in their final form, skipping the unwrap-and-rewrap step at the end. Relates elastic#77466 Relates elastic#92729

BroadcastReplicationAction derivatives (`POST /<indices>/_refresh` and `POST /<indices>/_flush`) are pretty inefficient when targeting high shard counts due to how `TransportBroadcastReplicationAction` works: - It computes the list of all target shards up-front on the calling (transport) thread. - It accumulates responses in a `CopyOnWriteArrayList` which takes quadratic work to populate, even though nothing reads this list until it's fully populated. - It then mostly discards the accumulated responses, keeping only the total number of shards, the number of successful shards, and a list of any failures. - Each failure is wrapped up in a `ReplicationResponse.ShardInfo.Failure` but then unwrapped at the end to be re-wrapped in a `DefaultShardOperationFailedException`. This commit fixes all this: - The computation of the list of shards, and the sending of the per-shard requests, now happens on the relevant threadpool (`REFRESH` or `FLUSH`) rather than a transport thread. - The failures are tracked in a regular `ArrayList`, avoiding the accidentally-quadratic complexity. - Rather than accumulating the full responses for later processing we track the counts and failures directly. - The failures are tracked in their final form, skipping the unwrap-and-rewrap step at the end. Relates #77466 Relates #92729

original-brownbear added >bug Meta :Distributed Indexing/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. :Security/Authorization Roles, Privileges, DLS/FLS, RBAC/ABAC :Data Management/Other labels Sep 9, 2021

original-brownbear assigned martijnvg, probakowski and original-brownbear Sep 9, 2021

elasticmachine added Team:Security Meta label for security team Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. Team:Data Management Meta label for data/management team labels Sep 9, 2021

martijnvg mentioned this issue Sep 16, 2021

Improve LifecycleExecutionState parsing. #77855

Merged

martijnvg mentioned this issue Sep 16, 2021

Reduce the number of times that LifecycleExecutionState is parsed when running a policy #77863

Merged

original-brownbear mentioned this issue Sep 28, 2021

Prevent Duplicate ILM Cluster State Updates from Being Created #78390

Merged

original-brownbear mentioned this issue Oct 4, 2022

Fork building snapshot status response off of transport thread #90651

Merged

original-brownbear added a commit that referenced this issue Oct 5, 2022

Fork building snapshot status response off of transport thread (#90651)

0de2cdc

This can take O(10s) for tens of thousands of shards, we have to fork it. relates #77466

elasticsearchmachine pushed a commit that referenced this issue Oct 5, 2022

Fork building snapshot status response off of transport thread (#90651)…

fef481c

… (#90664) This can take O(10s) for tens of thousands of shards, we have to fork it. relates #77466

original-brownbear mentioned this issue Nov 7, 2022

Deserialize responses on the handling thread-pool #91367

Merged

miltonhultgren mentioned this issue Nov 29, 2022

[Elasticsearch] Optimize handling of large cluster payloads elastic/beats#33862

Open

original-brownbear mentioned this issue Dec 30, 2022

org.elasticsearch.cluster.metadata.MetadataIndexAliasesService#applyAliasActions can become very slow when adding aliases to large data streams #92609

Open

This was referenced Jan 6, 2023

Improve scalability of BroadcastReplicationActions #92729

Closed

Internal action like shard started/failure should not trigger circuit breaker #92783

Closed

DaveCTurner mentioned this issue Jan 13, 2023

Improve scalability of BroadcastReplicationActions #92902

Merged

DaveCTurner mentioned this issue Mar 31, 2023

Improve sharing and diffability of IndexRoutingTable #94933

Open

2 tasks

DaveCTurner mentioned this issue Jun 29, 2023

Computing IndicesQueryCache stats is O(N²) in shard count #97222

Open

DaveCTurner mentioned this issue Jul 25, 2023

TransportBroadcastByNodeAction does O(#shards) work on transport worker thread #97914

Closed

3 tasks

DaveCTurner mentioned this issue Sep 6, 2023

Limit shard failures accumulated by searches #99220

Open

DaveCTurner mentioned this issue Oct 3, 2023

DataTiersUsageTransportAction is incredibly inefficient in large clusters #100230

Closed

DaveCTurner mentioned this issue Nov 4, 2023

Reduce usage of TransportMasterNodeReadAction #101805

Open

40 tasks

dakrone added :Data Management/ILM+SLM Index and Snapshot lifecycle management and removed :Data Management/Other labels Nov 16, 2023

DaveCTurner mentioned this issue Jan 17, 2024

Aliases usage/sizing guidance & scaling limits #104456

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Large Shard Count Scalability Issues #77466

Fix Large Shard Count Scalability Issues #77466

original-brownbear commented Sep 9, 2021 •

edited by DaveCTurner

Loading

elasticmachine commented Sep 9, 2021

elasticmachine commented Sep 9, 2021

elasticmachine commented Sep 9, 2021

Fix Large Shard Count Scalability Issues #77466

Fix Large Shard Count Scalability Issues #77466

Comments

original-brownbear commented Sep 9, 2021 • edited by DaveCTurner Loading

elasticmachine commented Sep 9, 2021

elasticmachine commented Sep 9, 2021

elasticmachine commented Sep 9, 2021

original-brownbear commented Sep 9, 2021 •

edited by DaveCTurner

Loading