[ML] Reinstate ML daily maintenance actions #47103

droberts195 · 2019-09-25T10:57:32Z

A refactoring in 6.6 meant that the ML daily
maintenance actions have not been run at all
since then. This change installs the local
master listener that schedules the ML daily
maintenance, and also defends against some
subtle race conditions that could occur in the
future if a node flipped very quickly between
master and non-master.

Fixes #47003

A refactoring in 6.6 meant that the ML daily maintenance actions have not been run at all since then. This change installs the local master listener that schedules the ML daily maintenance, and also defends against some subtle race conditions that could occur in the future if a node flipped very quickly between master and non-master. Fixes elastic#47003

elasticmachine · 2019-09-25T10:57:34Z

Pinging @elastic/ml-core

benwtrent · 2019-09-25T11:08:24Z

x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/MlDailyMaintenanceService.java

@@ -79,12 +79,12 @@ private static TimeValue delayToNextTime(ClusterName clusterName) {
        return TimeValue.timeValueMillis(next.toInstant().toEpochMilli() - now.toInstant().toEpochMilli());
    }

-    public void start() {
+    public synchronized void start() {


I suppose there is no harm in making this synchronized, it just seems unnecessary as all it does is call scheduleNext which is also synchronized.

Yes, it's redundant in terms of thread safety, but I thought it was worth making sure the logging order of "Starting ML daily maintenance service" and "Stopping ML daily maintenance service" matched the order that the corresponding operations were performed. Without this synchronized the log could say "Starting" followed by "Stopping", but actually the work of starting was done after the work of stopping.

dimitris-athanasiou

LGTM

Due to elastic#47003 many clusters will have built up a large backlog of expired results. On upgrading to a version where that bug is fixed users could find that the first ML daily maintenance task deletes a very large amount of documents. This change introduces throttling to the delete-by-query that the ML daily maintenance uses to delete expired results: - Average 200 documents per second - Maximum of 10 million documents per day (There is no throttling for state/forecast documents as these are expected to be lower volume.) Relates elastic#47103

A refactoring in 6.6 meant that the ML daily maintenance actions have not been run at all since then. This change installs the local master listener that schedules the ML daily maintenance, and also defends against some subtle race conditions that could occur in the future if a node flipped very quickly between master and non-master. Fixes #47003

Due to #47003 many clusters will have built up a large backlog of expired results. On upgrading to a version where that bug is fixed users could find that the first ML daily maintenance task deletes a very large amount of documents. This change introduces throttling to the delete-by-query that the ML daily maintenance uses to delete expired results to limit it to deleting an average 200 documents per second. (There is no throttling for state/forecast documents as these are expected to be lower volume.) Additionally a rough time limit of 8 hours is applied to the whole delete expired data action. (This is only rough as it won't stop part way through a single operation - it only checks the timeout between operations.) Relates #47103

droberts195 added >bug :ml Machine learning v8.0.0 v7.5.0 v6.8.4 v7.4.1 labels Sep 25, 2019

droberts195 requested a review from dimitris-athanasiou September 25, 2019 10:57

droberts195 mentioned this pull request Sep 25, 2019

[ML] Nightly maintenance is not triggered #47003

Closed

benwtrent approved these changes Sep 25, 2019

View reviewed changes

dimitris-athanasiou approved these changes Sep 25, 2019

View reviewed changes

droberts195 merged commit aeeba16 into elastic:master Sep 26, 2019

droberts195 deleted the fix_ml_daily_maintenance branch September 26, 2019 14:24

droberts195 added the backport pending label Sep 26, 2019

droberts195 mentioned this pull request Sep 26, 2019

[ML] Throttle the delete-by-query of expired results #47177

Merged

colings86 added v7.4.0 v7.4.1 and removed v7.4.1 v7.4.0 labels Sep 27, 2019

droberts195 removed the backport pending label Sep 30, 2019

codebrain mentioned this pull request Oct 25, 2019

7.4.1 meta ticket elastic/elasticsearch-net#4174

Closed

39 tasks

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ML] Reinstate ML daily maintenance actions #47103

[ML] Reinstate ML daily maintenance actions #47103

droberts195 commented Sep 25, 2019

elasticmachine commented Sep 25, 2019

benwtrent Sep 25, 2019

droberts195 Sep 25, 2019

dimitris-athanasiou left a comment

[ML] Reinstate ML daily maintenance actions #47103

[ML] Reinstate ML daily maintenance actions #47103

Conversation

droberts195 commented Sep 25, 2019

elasticmachine commented Sep 25, 2019

benwtrent Sep 25, 2019

Choose a reason for hiding this comment

droberts195 Sep 25, 2019

Choose a reason for hiding this comment

dimitris-athanasiou left a comment

Choose a reason for hiding this comment