Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updating settings on a large number of indices can take minutes #87120

Closed
original-brownbear opened this issue May 25, 2022 · 2 comments
Closed
Labels
>bug :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.

Comments

@original-brownbear
Copy link
Member

Updating index settings on a large number of indices can take many minutes and the resulting cluster state update might fail to cleanly publish without warning because of the time it takes to persist to disk.

The disk part is obvious, we write one Lucene document per index, resulting in a massive write to Lucene.

[2022-05-25T13:06:48,324][WARN ][o.e.g.PersistedClusterStateService] [elasticsearch-5] writing full cluster state took [25208ms] which is above the warn threshold of [10s]; wrote global metadata and metadata for [50007] indices

The bigger part of why a large setting update is slow though is the index metadata validation that is run for each index. This validation deserialises + reserializes the mapping for every index that got updated, which for large mappings combined with a large number of updates indices can take many minutes.

  100.0% [cpu=98.6%, other=1.4%] (500ms out of 500ms) cpu usage by thread 'elasticsearch[elasticsearch-5][masterService#updateTask][T#1]'
     10/10 snapshots sharing following 26 elements
       app/[email protected]/org.elasticsearch.index.mapper.ObjectMapper$Builder.buildMappers(ObjectMapper.java:150)
       app/[email protected]/org.elasticsearch.index.mapper.ObjectMapper$Builder.build(ObjectMapper.java:171)
       app/[email protected]/org.elasticsearch.index.mapper.ObjectMapper$Builder.build(ObjectMapper.java:64)
       app/[email protected]/org.elasticsearch.index.mapper.ObjectMapper$Builder.buildMappers(ObjectMapper.java:150)
       app/[email protected]/org.elasticsearch.index.mapper.RootObjectMapper$Builder.build(RootObjectMapper.java:110)
       app/[email protected]/org.elasticsearch.index.mapper.MappingParser.parse(MappingParser.java:99)
       app/[email protected]/org.elasticsearch.index.mapper.MappingParser.parse(MappingParser.java:94)
       app/[email protected]/org.elasticsearch.index.mapper.MapperService.parseMapping(MapperService.java:370)
       app/[email protected]/org.elasticsearch.index.mapper.MapperService.merge(MapperService.java:347)
       app/[email protected]/org.elasticsearch.index.mapper.MapperService.merge(MapperService.java:337)
       app/[email protected]/org.elasticsearch.indices.IndicesService.verifyIndexMetadata(IndicesService.java:810)
       app/[email protected]/org.elasticsearch.cluster.metadata.MetadataUpdateSettingsService$1.execute(MetadataUpdateSettingsService.java:247)
       app/[email protected]/org.elasticsearch.cluster.metadata.MetadataUpdateSettingsService.lambda$new$0(MetadataUpdateSettingsService.java:79)
       app/[email protected]/org.elasticsearch.cluster.metadata.MetadataUpdateSettingsService$$Lambda$3405/0x00000008014c4400.execute(Unknown Source)
       app/[email protected]/org.elasticsearch.cluster.service.MasterService.innerExecuteTasks(MasterService.java:908)
       app/[email protected]/org.elasticsearch.cluster.service.MasterService.executeTasks(MasterService.java:878)
       app/[email protected]/org.elasticsearch.cluster.service.MasterService.runTasks(MasterService.java:248)
       app/[email protected]/org.elasticsearch.cluster.service.MasterService$Batcher.run(MasterService.java:156)
       app/[email protected]/org.elasticsearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:110)
       app/[email protected]/org.elasticsearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:148)
       app/[email protected]/org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:709)
       app/[email protected]/org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:260)
       app/[email protected]/org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:223)
       [email protected]/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
       [email protected]/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
       [email protected]/java.lang.Thread.run(Thread.java:833)

We should find a way to skip unnecessary mapping validation when nothing about the mappings has changed.

relates #77466

@original-brownbear original-brownbear added >bug :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. labels May 25, 2022
@elasticmachine elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label May 25, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@DaveCTurner
Copy link
Contributor

I extracted the bit about the mapping validation to #89309 since that's in the search team's domain. I think we've made improvements to the serialization speed since this issue was opened too, and nothing else is planned here for now, so I'm closing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.
Projects
None yet
Development

No branches or pull requests

3 participants