Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Fix master node deadlock during ML daily maintenance #31836

Conversation

dimitris-athanasiou
Copy link
Contributor

@dimitris-athanasiou dimitris-athanasiou commented Jul 5, 2018

This is the implementation for master and 6.x of #31691.
Native tests are changed to use multi-node clusters in #31757.

Relates #31683

@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core

@droberts195 droberts195 removed the v6.5.0 label Jul 6, 2018
@@ -79,7 +84,8 @@ public void remove(ActionListener<Boolean> listener) {

SearchRequest searchRequest = new SearchRequest(RESULTS_INDEX_PATTERN);
searchRequest.source(source);
client.execute(SearchAction.INSTANCE, searchRequest, forecastStatsHandler);
client.execute(SearchAction.INSTANCE, searchRequest, new ThreadedActionListener<>(LOGGER, threadPool,
MachineLearning.UTILITY_THREAD_POOL_NAME, forecastStatsHandler, false));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the removeDataBefore() method in ExpiredModelSnapshotsRemover should also use a ThreadedActionListener in exactly the same way this class does. It also has the problem of doing a (potentially) large amount of parsing on the network thread.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I pushed a commit to fix that too.

@droberts195 droberts195 changed the title [ML] Ensure ML daily maintenance service does not block IO [ML] Fix master node deadlock during ML daily maintenance Jul 6, 2018
Copy link
Contributor

@droberts195 droberts195 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

I changed the PR title to be identical to #31691 so it's clearer to anyone browsing the PR list that it's fixing the same problem.

@dimitris-athanasiou dimitris-athanasiou force-pushed the ensure-ml-daily-maintenance-service-does-not-block-io branch from d830877 to 4e61e8a Compare July 6, 2018 15:13
@dimitris-athanasiou dimitris-athanasiou merged commit 49ba271 into elastic:master Jul 7, 2018
@dimitris-athanasiou dimitris-athanasiou deleted the ensure-ml-daily-maintenance-service-does-not-block-io branch July 7, 2018 08:43
dimitris-athanasiou added a commit that referenced this pull request Jul 7, 2018
This is the implementation for master and 6.x of #31691.
Native tests are changed to use multi-node clusters in #31757.

Relates #31683
dnhatn added a commit that referenced this pull request Jul 7, 2018
* 6.x:
  [ML] Fix master node deadlock during ML daily maintenance (#31836)
  Build: Switch integ-test-zip to OSS-only (#31866)
  Build: Fix detection of Eclipse Compiler Server (#31838)
  SQL: Remove restriction for single column grouping (#31818)
  Docs: Inconsistency between description and example (#31858)
  Fix and reenable TribeIntegrationTests
  QA: build improvements related to SQL projects (#31862)
  muted test
  [Docs] Add clarification to analysis example (#31826)
  Check timeZone() argument in AbstractSqlQueryRequest (#31822)
  Remove obsolete parameters from analyze rest spec (#31795)
  SQL: Fix incorrect HAVING equality (#31820)
  Smaller aesthetic fixes to InternalTestCluster (#31831)
  [Docs] Clarify accepted sort case (#31605)
  Do not return all indices if a specific alias is requested via get aliases api. (#29538)
  [Docs] Fix wrong link in Korean analyzer docs (#31815)
  Fix profiling of ordered terms aggs (#31814)
  Fix handling of points_only with term strategy in geo_shape (#31766)
  Docs: Explain _bulk?refresh shard targeting
  REST high-level client: add get index API (#31703)
dnhatn added a commit that referenced this pull request Jul 7, 2018
* master:
  [ML] Fix master node deadlock during ML daily maintenance (#31836)
  Build: Switch integ-test-zip to OSS-only (#31866)
  SQL: Remove restriction for single column grouping (#31818)
  Build: Fix detection of Eclipse Compiler Server (#31838)
  Docs: Inconsistency between description and example (#31858)
  Re-enable bwc tests now that #29538 has been backported and 6.x intake build succeeded.
  QA: build improvements related to SQL projects (#31862)
  [Docs] Add clarification to analysis example (#31826)
  Check timeZone() argument in AbstractSqlQueryRequest (#31822)
  SQL: Fix incorrect HAVING equality (#31820)
  Smaller aesthetic fixes to InternalTestCluster (#31831)
  [Docs] Clarify accepted sort case (#31605)
  Temporarily disable bwc test in order to backport #29538
  Remove obsolete parameters from analyze rest spec (#31795)
  [Docs] Fix wrong link in Korean analyzer docs (#31815)
  Fix profiling of ordered terms aggs (#31814)
  Properly mute test involving JDK11 closes #31739
  Do not return all indices if a specific alias is requested via get aliases api. (#29538)
  Get snapshot rest client cleanups (#31740)
  Docs: Explain _bulk?refresh shard targeting
  Fix handling of points_only with term strategy in geo_shape (#31766)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants