-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clear Job#finished_time when it is opened (#32605) #32755
Clear Job#finished_time when it is opened (#32605) #32755
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Left a few comments.
// Step 6. Clear job finished time once the job is started and respond | ||
ActionListener<OpenJobAction.Response> clearJobFinishTime = ActionListener.wrap( | ||
response -> { | ||
if(response.isAcknowledged()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: space after if
@@ -574,6 +592,35 @@ public void onTimeout(TimeValue timeout) { | |||
}); | |||
} | |||
|
|||
private void clearJobFinishedTime(String jobId, ActionListener<OpenJobAction.Response> listener) { | |||
clusterService.submitStateUpdateTask("clearing_job_finish_time [" + jobId + "]", new ClusterStateUpdateTask() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: it seems that in most places we use hyphens to name the cluster state update tasks. It would be more consistent to rename this to clear-finished-time-for-job-{job_id}
.
clusterService.submitStateUpdateTask("clearing_job_finish_time [" + jobId + "]", new ClusterStateUpdateTask() { | ||
@Override | ||
public ClusterState execute(ClusterState currentState) { | ||
XPackPlugin.checkReadyForXPackCustomMetadata(currentState); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check is unnecessary as if a job is being opened we are certain there is ML Metadata already installed.
|
||
@Override | ||
public void onFailure(String source, Exception e) { | ||
listener.onFailure(e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am tempted to handle failure differently here. Even if the cluster state update failed, the job has successfully opened. It might be more misleading to return an error to the user, as they may think the job has not been opened. I wonder if logging the error should be enough. Any thoughts @droberts195 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I too considered just logging the issue and still letting the end user know that it succeeded. Definitely a contentious decision that needs discussion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should just log the error. As @dimitris-athanasiou said, if it fails here the job will have a status of open
. So anyone who reacted to the failure to open by trying again would be bemused to find out that they couldn't open the job the second time because it was already open.
@@ -0,0 +1,57 @@ | |||
package org.elasticsearch.xpack.ml.integration; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing license header
import static org.hamcrest.CoreMatchers.nullValue; | ||
import static org.hamcrest.Matchers.is; | ||
|
||
public class ReopenJobResetsFinishedTimeIT extends MlNativeAutodetectIntegTestCase { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! :-)
The build failed with checkstyle
If you want to run this check locally use |
@benwtrent Can you please add labels to this PR, including version, area and change type labels |
Pinging @elastic/ml-core |
@@ -611,7 +607,8 @@ public ClusterState execute(ClusterState currentState) { | |||
|
|||
@Override | |||
public void onFailure(String source, Exception e) { | |||
listener.onFailure(e); | |||
logger.error(source + "Failed to clear finished_time due to [" + e.getMessage() + "]", e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We tend to start the messages with the job_id. So, this could become:
logger.error("[" + jobId + "] Failed to clear finished_time; source [" + source + "]", e);
@@ -593,10 +590,9 @@ public void onTimeout(TimeValue timeout) { | |||
} | |||
|
|||
private void clearJobFinishedTime(String jobId, ActionListener<OpenJobAction.Response> listener) { | |||
clusterService.submitStateUpdateTask("clearing_job_finish_time [" + jobId + "]", new ClusterStateUpdateTask() { | |||
clusterService.submitStateUpdateTask("clearing-job-finish-time [" + jobId + "]", new ClusterStateUpdateTask() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry for insisting on that but so that task names are consistent (should make it easier to search through them) could you change to "clearing-finish-time-for-job-" + jobId
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…listeners * elastic/master: (58 commits) [ML] Partition-wise maximum scores (elastic#32748) [DOCS] XContentBuilder#bytes method removed, using BytesReference.bytes(docBuilder) (elastic#32771) HLRC: migration get assistance API (elastic#32744) Add a task to run forbiddenapis using cli (elastic#32076) [Kerberos] Add debug log statement for exceptions (elastic#32663) Make x-pack core pull transport-nio (elastic#32757) Painless: Clean Up Whitelist Names (elastic#32791) Cat apis: Fix index creation time to use strict date format (elastic#32510) Clear Job#finished_time when it is opened (elastic#32605) (elastic#32755) Test: Only sniff host metadata for node_selectors (elastic#32750) Update scripted metric docs to use `state` variable (elastic#32695) Painless: Clean up PainlessCast (elastic#32754) [TEST] Certificate NONE not allowed in FIPS JVM (elastic#32753) [ML] Refactor ProcessCtrl into Autodetect and Normalizer builders (elastic#32720) Access build tools resources (elastic#32201) Tests: Disable rolling upgrade tests with system key on fips JVM (elastic#32775) HLRC: Ban LoggingDeprecationHandler (elastic#32756) Fix test reproducability in AbstractBuilderTestCase setup (elastic#32403) Only require java<version>_home env var if needed Tests: Muted ScriptDocValuesDatesTests.testJodaTimeBwc ...
When a job is opened, a new action is taken now where the cluster state is updated, such that the job that was opened has its
finished_time
set tonull
.Verified this works locally and through new integration test.