-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid parallel reroutes in DiskThresholdMonitor #43381
Avoid parallel reroutes in DiskThresholdMonitor #43381
Conversation
Today the `DiskThresholdMonitor` limits the frequency with which it submits reroute tasks, but it might still submit these tasks faster than the master can process them if, for instance, each reroute takes over 60 seconds. This causes a problem since the reroute task runs with priority `IMMEDIATE` and is always scheduled when there is a node over the high watermark, so this can starve any other pending tasks on the master. This change avoids further updates from the monitor while its last task(s) are still in progress, and it measures the time of each update from the completion time of the reroute task rather than its start time, to allow a larger window for other tasks to run. Fixes elastic#40174
Pinging @elastic/es-distributed |
} | ||
|
||
final ImmutableOpenMap<String, DiskUsage> usages = info.getNodeLeastAvailableDiskUsages(); | ||
if (usages == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably best to look at this bit ignoring whitespace changes - I removed a level of indentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good tip
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still not super happy that we are sending a task to be executed at priority IMMEDIATE. I would rather have this call RoutingService
. In that case, we could also avoid this whole business of tracking whether there is already a call in progress (that's taken care of by RoutingService). WDYT?
} | ||
|
||
final ImmutableOpenMap<String, DiskUsage> usages = info.getNodeLeastAvailableDiskUsages(); | ||
if (usages == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good tip
I agree on the priority thing, but the |
I think HIGH priority is ok for now. I wonder why we need the notification on completion. What does it keep the frequency low of? If we're batching calls, it's fine to have multiple pending attempts? |
Ok I have added to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've left two small asks. Looking good o.w.
@@ -379,7 +379,7 @@ public void clusterStatePublished(ClusterChangedEvent clusterChangedEvent) { | |||
if (logger.isTraceEnabled()) { | |||
logger.trace("{}, scheduling a reroute", reason); | |||
} | |||
routingService.reroute(reason); | |||
routingService.reroute(reason, ActionListener.wrap(() -> logger.trace("{}, reroute completed", reason))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this also logs the same line on an exception :/
I would prefer two different log lines, and the failure one with the exception (same for other places in this PR)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in ce5946b
if (nodes.contains(node) == false) { | ||
nodeHasPassedWatermark.remove(node); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert that rerouteAction is set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in c1d6ee0
@elasticmachine please run elasticsearch-ci/2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…k-threshold-monitor
Today the `DiskThresholdMonitor` limits the frequency with which it submits reroute tasks, but it might still submit these tasks faster than the master can process them if, for instance, each reroute takes over 60 seconds. This causes a problem since the reroute task runs with priority `IMMEDIATE` and is always scheduled when there is a node over the high watermark, so this can starve any other pending tasks on the master. This change avoids further updates from the monitor while its last task(s) are still in progress, and it measures the time of each update from the completion time of the reroute task rather than its start time, to allow a larger window for other tasks to run. It also now makes use of the `RoutingService` to submit the reroute task, in order to batch this task with any other pending reroutes. It enhances the `RoutingService` to notify its listeners on completion. Fixes #40174 Relates #42559
Today the
DiskThresholdMonitor
limits the frequency with which it submitsreroute tasks, but it might still submit these tasks faster than the master can
process them if, for instance, each reroute takes over 60 seconds. This causes
a problem since the reroute task runs with priority
IMMEDIATE
and is alwaysscheduled when there is a node over the high watermark, so this can starve any
other pending tasks on the master.
This change avoids further updates from the monitor while its last task(s) are
still in progress, and it measures the time of each update from the completion
time of the reroute task rather than its start time, to allow a larger window
for other tasks to run.
Fixes #40174