-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] Only report complete writing_results progress after completion #49551
[ML] Only report complete writing_results progress after completion #49551
Conversation
We depend on the number of data frame rows in order to report progress for the writing of results, the last phase of a job run. However, results include other objects than just the data frame rows (e.g, progress, inference model, etc.). The problem this commit fixes is that if we receive the last data frame row results we'll report that progress is complete even though we still have more results to process potentially. If the job gets stopped for any reason at this point, we will not be able to restart the job properly as we'll think that the job was completed. This commit addresses this by limiting the max progress we can report for the writing_results phase before the results processor completes to 98. At the end, when the process is done we set the progress to 100.
Pinging @elastic/ml-core (:ml) |
@@ -107,11 +114,22 @@ public void process(AnalyticsProcess<AnalyticsResult> process) { | |||
failure = "error parsing data frame analytics output: [" + e.getMessage() + "]"; | |||
} | |||
} finally { | |||
if (isCancelled == false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we take into account failure != null
? It seems weird to me that we are "complete" even if we failed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very good point. I also took the opportunity to improve failure reporting in that class in general.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
one minor comment in test code
assertThat(resultProcessor.getFailure(), equalTo("error processing results; some failure")); | ||
|
||
ArgumentCaptor<String> auditCaptor = ArgumentCaptor.forClass(String.class); | ||
verify(auditor).error(eq(JOB_ID), auditCaptor.capture()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to replace auditCaptor.capture()
with containsString("Error processing results; some failure")
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to call capture()
on the captor for it to capture the argument to the mocked method.
…ion (elastic#49551) We depend on the number of data frame rows in order to report progress for the writing of results, the last phase of a job run. However, results include other objects than just the data frame rows (e.g, progress, inference model, etc.). The problem this commit fixes is that if we receive the last data frame row results we'll report that progress is complete even though we still have more results to process potentially. If the job gets stopped for any reason at this point, we will not be able to restart the job properly as we'll think that the job was completed. This commit addresses this by limiting the max progress we can report for the writing_results phase before the results processor completes to 98. At the end, when the process is done we set the progress to 100. The commit also improves failure capturing and reporting in the results processor. Backport of elastic#49551
…ion (#49551) (#49577) We depend on the number of data frame rows in order to report progress for the writing of results, the last phase of a job run. However, results include other objects than just the data frame rows (e.g, progress, inference model, etc.). The problem this commit fixes is that if we receive the last data frame row results we'll report that progress is complete even though we still have more results to process potentially. If the job gets stopped for any reason at this point, we will not be able to restart the job properly as we'll think that the job was completed. This commit addresses this by limiting the max progress we can report for the writing_results phase before the results processor completes to 98. At the end, when the process is done we set the progress to 100. The commit also improves failure capturing and reporting in the results processor. Backport of #49551
We depend on the number of data frame rows in order to report progress
for the writing of results, the last phase of a job run. However, results
include other objects than just the data frame rows (e.g, progress, inference model, etc.).
The problem this commit fixes is that if we receive the last data frame row results
we'll report that progress is complete even though we still have more results to process
potentially. If the job gets stopped for any reason at this point, we will not be able
to restart the job properly as we'll think that the job was completed.
This commit addresses this by limiting the max progress we can report for the
writing_results phase before the results processor completes to 98.
At the end, when the process is done we set the progress to 100.