Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent Duplicate ILM Cluster State Updates from Being Created #78390

Merged
merged 4 commits into from
Sep 29, 2021

Conversation

original-brownbear
Copy link
Member

@original-brownbear original-brownbear commented Sep 28, 2021

Prevent duplicate ILM tasks from being enqueued to fix the most immediate issues around #78246. The ILM logic should be further improved though. I did not include MoveToErrorStepUpdateTask in this change yet as I wasn't entirely sure how valid/safe hashing/comparing arbitrary Exceptions would be. That could be looked into in a follow-up as well.

Relates #77466

Closes #78246

@original-brownbear original-brownbear added WIP :Data Management/ILM+SLM Index and Snapshot lifecycle management labels Sep 28, 2021
@original-brownbear original-brownbear marked this pull request as ready for review September 28, 2021 14:37
@elasticmachine elasticmachine added the Team:Data Management Meta label for data/management team label Sep 28, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

Copy link
Contributor

@probakowski probakowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @original-brownbear, this change looks good. I left just one question and small suggestion.
I tested this locally and with creating 1000 indices at the same time and tasks count (_cat/pending_tasks) never went above 5000 comparing to 1000000 without this change.

}

private boolean registerTask(IndexLifecycleClusterStateUpdateTask task) {
synchronized (executingTasks) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason for explicit synchronized block and not using Collections.synchronizedSet(new HashSet<>()) or ConcurrentHashMap?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right ... not sure why I did it this used the sync set now and just inlined everything since it's all one liners now :)

Thanks!

Comment on lines 538 to 542
final boolean removed;
synchronized (executingTasks) {
removed = executingTasks.remove(task);
}
assert removed : "tried to unregister unknown task [" + task + "]";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
final boolean removed;
synchronized (executingTasks) {
removed = executingTasks.remove(task);
}
assert removed : "tried to unregister unknown task [" + task + "]";
synchronized (executingTasks) {
final boolean removed = executingTasks.remove(task);
assert removed : "tried to unregister unknown task [" + task + "]";
}

Copy link
Contributor

@probakowski probakowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @original-brownbear!

@original-brownbear
Copy link
Member Author

Thanks @probakowski !

@original-brownbear original-brownbear merged commit 990aa34 into elastic:master Sep 29, 2021
@original-brownbear original-brownbear deleted the dedup-ilm branch September 29, 2021 05:49
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this pull request Sep 29, 2021
…ic#78390)

Prevent duplicate ILM tasks from being enqueued to fix the most immediate issues around elastic#78246. The ILM logic should be further improved though. I did not include `MoveToErrorStepUpdateTask` in this change yet as I wasn't entirely sure how valid/safe hashing/comparing arbitrary `Exception`s would be. That could be looked into in a follow-up as well.

Relates elastic#77466 

Closes elastic#78246
original-brownbear added a commit that referenced this pull request Sep 29, 2021
… (#78427)

Prevent duplicate ILM tasks from being enqueued to fix the most immediate issues around #78246. The ILM logic should be further improved though. I did not include `MoveToErrorStepUpdateTask` in this change yet as I wasn't entirely sure how valid/safe hashing/comparing arbitrary `Exception`s would be. That could be looked into in a follow-up as well.

Relates #77466 

Closes #78246
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this pull request Sep 29, 2021
Follow up to elastic#78390. The `EmptyInfo` would not compare
correctly because it doesn't implement equals or hashcode,
breaking deduplication for `SetStepInfoUpdateTask`.

=> just making it a singleton to fix this and have a fast comp via
instance equality.
original-brownbear added a commit that referenced this pull request Sep 29, 2021
Follow up to #78390. The `EmptyInfo` would not compare
correctly because it doesn't implement equals or hashcode,
breaking deduplication for `SetStepInfoUpdateTask`.

=> just making it a singleton to fix this and have a fast comp via
instance equality.
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this pull request Sep 29, 2021
Follow up to elastic#78390. The `EmptyInfo` would not compare
correctly because it doesn't implement equals or hashcode,
breaking deduplication for `SetStepInfoUpdateTask`.

=> just making it a singleton to fix this and have a fast comp via
instance equality.
original-brownbear added a commit that referenced this pull request Sep 29, 2021
Follow up to #78390. The `EmptyInfo` would not compare
correctly because it doesn't implement equals or hashcode,
breaking deduplication for `SetStepInfoUpdateTask`.

=> just making it a singleton to fix this and have a fast comp via
instance equality.
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this pull request Sep 29, 2021
If the current combination of current-step and index has a running CS update task
enqueued there is no point in adding yet another task for this combination on the applier
and we can skip the expensive inspection for the index.

follow up to elastic#78390
original-brownbear added a commit that referenced this pull request Sep 30, 2021
If the current combination of current-step and index has a running CS update task
enqueued there is no point in adding yet another task for this combination on the applier
and we can skip the expensive inspection for the index.

follow up to #78390
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this pull request Sep 30, 2021
If the current combination of current-step and index has a running CS update task
enqueued there is no point in adding yet another task for this combination on the applier
and we can skip the expensive inspection for the index.

follow up to elastic#78390
original-brownbear added a commit that referenced this pull request Sep 30, 2021
If the current combination of current-step and index has a running CS update task
enqueued there is no point in adding yet another task for this combination on the applier
and we can skip the expensive inspection for the index.

follow up to #78390
@original-brownbear original-brownbear restored the dedup-ilm branch April 18, 2023 20:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Data Management/ILM+SLM Index and Snapshot lifecycle management Team:Data Management Meta label for data/management team v7.16.0 v8.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ILM Can Create an Unlimited Number of Pending Clusterstate Updates on Slow Master Nodes
5 participants