[ML] Only report complete writing_results progress after completion #49551

dimitris-athanasiou · 2019-11-25T14:34:08Z

We depend on the number of data frame rows in order to report progress
for the writing of results, the last phase of a job run. However, results
include other objects than just the data frame rows (e.g, progress, inference model, etc.).

The problem this commit fixes is that if we receive the last data frame row results
we'll report that progress is complete even though we still have more results to process
potentially. If the job gets stopped for any reason at this point, we will not be able
to restart the job properly as we'll think that the job was completed.

This commit addresses this by limiting the max progress we can report for the
writing_results phase before the results processor completes to 98.
At the end, when the process is done we set the progress to 100.

We depend on the number of data frame rows in order to report progress for the writing of results, the last phase of a job run. However, results include other objects than just the data frame rows (e.g, progress, inference model, etc.). The problem this commit fixes is that if we receive the last data frame row results we'll report that progress is complete even though we still have more results to process potentially. If the job gets stopped for any reason at this point, we will not be able to restart the job properly as we'll think that the job was completed. This commit addresses this by limiting the max progress we can report for the writing_results phase before the results processor completes to 98. At the end, when the process is done we set the progress to 100.

elasticmachine · 2019-11-25T14:34:10Z

Pinging @elastic/ml-core (:ml)

benwtrent · 2019-11-25T14:54:25Z

.../ml/src/main/java/org/elasticsearch/xpack/ml/dataframe/process/AnalyticsResultProcessor.java

@@ -107,11 +114,22 @@ public void process(AnalyticsProcess<AnalyticsResult> process) {
                failure = "error parsing data frame analytics output: [" + e.getMessage() + "]";
            }
        } finally {
+            if (isCancelled == false) {


Should we take into account failure != null? It seems weird to me that we are "complete" even if we failed.

Very good point. I also took the opportunity to improve failure reporting in that class in general.

benwtrent

przemekwitek

LGTM

one minor comment in test code

przemekwitek · 2019-11-26T08:09:23Z

...rc/test/java/org/elasticsearch/xpack/ml/dataframe/process/AnalyticsResultProcessorTests.java

+        assertThat(resultProcessor.getFailure(), equalTo("error processing results; some failure"));
+
+        ArgumentCaptor<String> auditCaptor = ArgumentCaptor.forClass(String.class);
+        verify(auditor).error(eq(JOB_ID), auditCaptor.capture());


Is it possible to replace auditCaptor.capture() with containsString("Error processing results; some failure")?

We need to call capture() on the captor for it to capture the argument to the mocked method.

…ion (elastic#49551) We depend on the number of data frame rows in order to report progress for the writing of results, the last phase of a job run. However, results include other objects than just the data frame rows (e.g, progress, inference model, etc.). The problem this commit fixes is that if we receive the last data frame row results we'll report that progress is complete even though we still have more results to process potentially. If the job gets stopped for any reason at this point, we will not be able to restart the job properly as we'll think that the job was completed. This commit addresses this by limiting the max progress we can report for the writing_results phase before the results processor completes to 98. At the end, when the process is done we set the progress to 100. The commit also improves failure capturing and reporting in the results processor. Backport of elastic#49551

…ion (#49551) (#49577) We depend on the number of data frame rows in order to report progress for the writing of results, the last phase of a job run. However, results include other objects than just the data frame rows (e.g, progress, inference model, etc.). The problem this commit fixes is that if we receive the last data frame row results we'll report that progress is complete even though we still have more results to process potentially. If the job gets stopped for any reason at this point, we will not be able to restart the job properly as we'll think that the job was completed. This commit addresses this by limiting the max progress we can report for the writing_results phase before the results processor completes to 98. At the end, when the process is done we set the progress to 100. The commit also improves failure capturing and reporting in the results processor. Backport of #49551

dimitris-athanasiou added >non-issue :ml Machine learning v8.0.0 v7.6.0 labels Nov 25, 2019

benwtrent reviewed Nov 25, 2019

View reviewed changes

Do not report complete progress when there's a failure

e8d325d

benwtrent approved these changes Nov 25, 2019

View reviewed changes

przemekwitek approved these changes Nov 26, 2019

View reviewed changes

dimitris-athanasiou merged commit 7f94302 into elastic:master Nov 26, 2019

dimitris-athanasiou deleted the only-report-complete-progress-after-completion branch November 26, 2019 08:38

dimitris-athanasiou mentioned this pull request Nov 26, 2019

[7.x][ML] Only report complete writing_results progress after complet… #49577

Merged

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ML] Only report complete writing_results progress after completion #49551

[ML] Only report complete writing_results progress after completion #49551

dimitris-athanasiou commented Nov 25, 2019 •

edited

Loading

elasticmachine commented Nov 25, 2019

benwtrent Nov 25, 2019

dimitris-athanasiou Nov 25, 2019

benwtrent left a comment

przemekwitek left a comment

przemekwitek Nov 26, 2019

dimitris-athanasiou Nov 26, 2019

[ML] Only report complete writing_results progress after completion #49551

[ML] Only report complete writing_results progress after completion #49551

Conversation

dimitris-athanasiou commented Nov 25, 2019 • edited Loading

elasticmachine commented Nov 25, 2019

benwtrent Nov 25, 2019

Choose a reason for hiding this comment

dimitris-athanasiou Nov 25, 2019

Choose a reason for hiding this comment

benwtrent left a comment

Choose a reason for hiding this comment

przemekwitek left a comment

Choose a reason for hiding this comment

przemekwitek Nov 26, 2019

Choose a reason for hiding this comment

dimitris-athanasiou Nov 26, 2019

Choose a reason for hiding this comment

dimitris-athanasiou commented Nov 25, 2019 •

edited

Loading