-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance Regression for every CS update from ILM's org.elasticsearch.cluster.metadata.Metadata#isIndexManagedByILM #98992
Comments
Pinging @elastic/es-data-management (Team:Data Management) |
I had a look at the options that @original-brownbear suggested and tried to come up with some other options myself as well.
|
Maybe this helps: The many-shards project effectively concluded that ILM is somewhat fundamentally flawed in how it executes policies. Policies are optionally trigger on each cluster state update by inspecting each index individually. Thus the logic scales O(N) in the number of indices in the cluster which makes it by far the most expensive CS listener in larger clusters. |
Thanks, Armin for reporting this and Niels for working on it. ++ on making this more efficient. TIL about the cost of It's very surprising to me that reading the This condition guarding the setting read should seldom be true in stateful:
|
Going over the many shards benchmark bootstrapping I noticed it slowed down quite a bit recently.
Turns out a big contributor to this is
org.elasticsearch.cluster.metadata.Metadata#isIndexManagedByILM
called fromorg.elasticsearch.xpack.ilm.IndexLifecycleService#triggerPolicies
on every cluster state update and costing O(N) in the number of indices.This could be made more efficient in various ways:
At least we should:
Metadata.getIndicesLookup
, this one is extremely expensive on the applier threada first quick fix would be to first check if any datastreams even use DLM and if the answer is no, the whole logic can be skipped. This currently introduces an about 5% overhead into every CS update (relative to stuff like create index and shard allocation in the many shards benchmark) at 25k indices in a cluster and the overhead grows in O(number_of_indices).
The text was updated successfully, but these errors were encountered: