-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] Restore analytics state if available #47128
[ML] Restore analytics state if available #47128
Conversation
Pinging @elastic/ml-core |
} | ||
|
||
public TaskParams(StreamInput in) throws IOException { | ||
this.id = in.readString(); | ||
this.version = Version.readVersion(in); | ||
if (in.getVersion().onOrAfter(Version.V_7_5_0)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there BWC tests out there yet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not yet. I'll add them right after merging this one in to save the trouble with flipping the versions here.
|
||
static { | ||
PARSER.declareString(ConstructingObjectParser.constructorArg(), DataFrameAnalyticsConfig.ID); | ||
PARSER.declareString(ConstructingObjectParser.constructorArg(), DataFrameAnalyticsConfig.VERSION); | ||
PARSER.declareObjectArray(ConstructingObjectParser.constructorArg(), PhaseProgress.PARSER, PROGRESS_ON_START); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this required or optional? There are no null
checks in the ctor, so it seems like it is optional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. This is optional.
this.id = Objects.requireNonNull(id); | ||
this.version = Objects.requireNonNull(version); | ||
this.progressOnStart = progressOnStart; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this.progressOnStart = progressOnStart; | |
this.progressOnStart = progressOnStart == null ? null : Collections.unmodifiableList(progressOnStart); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed the constructor used by the parser to pass an empty list when progress_on_start
is missing. The main constructor is now calling Collections.unmodifiableList
which throws if the list is null
.
} | ||
|
||
public TaskParams(StreamInput in) throws IOException { | ||
this.id = in.readString(); | ||
this.version = Version.readVersion(in); | ||
if (in.getVersion().onOrAfter(Version.V_7_5_0)) { | ||
progressOnStart = in.readList(PhaseProgress::new); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this nullable or not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On version 7.5.0 onwards it will never be null. I'm addressing the null issues in total.
break; | ||
case RESUMING_REINDEXING: | ||
case RESUMING_ANALYZING: | ||
toValidateMappingsListener.onResponse(startContext); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to validate the mappings can be merged if we are already analyzing and have already reindexed all the data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are in resuming_reindexing
we have not necessarily reindexed all the data. In general, I'd rather we perform a redundant check here and keep the path the same rather than skipping it and trying to reason about whether there are some cases that the mappings check would be necessary.
this.config = config; | ||
this.progressOnStart = progressOnStart; | ||
for (PhaseProgress phase : progressOnStart) { | ||
LOGGER.info("[{}] Progress [{}] is [{}]", config.getId(), phase.getPhase(), phase.getProgressPercent()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems like a debug
level to me?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, that was a leftover. We don't really need it at all.
run elasticsearch-ci/2 |
run elasticsearch-ci/bwc |
@elasticmachine update branch |
This commit restores the model state if available in data frame analytics jobs. In addition, this changes the start API so that a stopped job can be restarted. As we now store the progress in the state index when the task is stopped, we can use it to determine what state the job was in when it got stopped.
In order to be able to distinguish between a job that runs for the first time and another that is restarting, we ensure reindexing progress is reported to be at least 1 for a running task. Also fixes an issue in the tests.
4f1eddc
to
689ee27
Compare
This commit restores the model state if available in data frame analytics jobs. In addition, this changes the start API so that a stopped job can be restarted. As we now store the progress in the state index when the task is stopped, we can use it to determine what state the job was in when it got stopped. Note that in order to be able to distinguish between a job that runs for the first time and another that is restarting, we ensure reindexing progress is reported to be at least 1 for a running task.
This commit restores the model state if available in data frame analytics jobs. In addition, this changes the start API so that a stopped job can be restarted. As we now store the progress in the state index when the task is stopped, we can use it to determine what state the job was in when it got stopped. Note that in order to be able to distinguish between a job that runs for the first time and another that is restarting, we ensure reindexing progress is reported to be at least 1 for a running task.
This commit restores the model state if available in data
frame analytics jobs.
In addition, this changes the start API so that a stopped job
can be restarted. As we now store the progress in the state index
when the task is stopped, we can use it to determine what state
the job was in when it got stopped.