-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] Data frame GET _stats response is confusing #43767
Comments
Pinging @elastic/ml-core |
We discussed the format and came up with the following new design:
|
elastic/kibana#40378 Note that if we want to show progress to the user in the UI, then |
This change adjusts the data frame stats endpoint to return the format discussed in elastic#43767. Relates elastic#43767
This turns out to be quite a far-reaching change. For checkpoints that are processed quickly it means that progress is only available for a very short time before that "next" checkpoint is completed and moves to "last". One effect is that integration tests cannot really assert on progress any more, because depending on OS scheduling there might not be any progress field in the stats by the time the test asks for it - search for |
In terms of test assertions this also suffers from the problem mentioned in the previous comment of:
In the 10th commit of #44350 I had to remove all the YAML test assertions on |
This change adjusts the data frame transforms stats endpoint to return a structure that is easier to understand. This is a breaking change for clients of the data frame transforms stats endpoint, but the feature is in beta so stability is not guaranteed. Closes #43767
Found in
7.3.0-SNAPSHOT
Continuous data frame
GET _stats
returns the following. Hopefully we can make this response a little less confusing:indexer_state
- a value ofstarted
means it is idle whereasindexing
means it is either searching or indexing. This is not precise, but is inherited from rollups so might be best to leave as is. Also having two states is confusing.checkpoint
- this is the last known completed (current) checkpoint. This could be confused with the currently underway checkpoint.progress
- this is the progress of the currently underway checkpoint. By calling itprogress
and leaving at the top level, it gives a false impression that it is somehow indicative of the whole transform.percent_complete
indicates the progress of a single checkpoint. With batch data frames there is only ever 1 checkpoint, so the other values make sense. However for continuous thetotal_docs
anddoc_remaining
should ideally be reset.progress
could be renamed tocheckpoint_progress
or combined with thecheckpointing
info to keep together.current_position
- this is the position of the cursor for the composite agg and will only be visible whilst the composite agg search is scrolling. This is context for the currently underway checkpoint. If a composite agg is not in progress, then this entire object is missing. A small nit, but its sporadic existence is weird.checkpointing
timestamp_millis
andtime_upper_bound_millis
where the latter istimestamp_millis - sync.delay
. Do we need both?current
refers to the current completed checkpoint whereasin_progress
refers to the currently underway checkpoint. This is confusing in conjunction withprogress
. Perhaps we could just keep upper and lower bound?in_progress
sometimes does not exist.progress
andcurrent_position
and maybeindexer_state
sit here?checkpoint: 101
- this is not clearTo summarise the priority points,
progress.total_docs
progress.docs_remaining
and is incorrect for continuous. This ischeckpoint_progress
for the next checkpoint.current*
and*progress
which may lead to confusion when trying to operationally manage and/or troubleshoot.The text was updated successfully, but these errors were encountered: