-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] Data frame progress is not reporting for continuous #44557
Comments
Pinging @elastic/ml-core |
Solutions (!?¡¿)Option 0Don’t do regular progress for continuousOnce it is past the initial batch loading, checkpoints will be processed pretty quickly. It seems much more valuable to have "current checkpoint and when it started and total processed documents" type of information. This in conjunction with "average checkpoint processing time" and "average processed documents" makes the most sense to me. Having a progress bar that fills quickly without context does not help much with continuous. Option 1Calculate progress total documents incrementally along side the PARTIAL_RUN_IDENTIFY_CHANGES run statePositives
Negatives
Note: We don’t know the total number of changed terms, the total number of pages required to gather the changed terms nor the number of pages to process the aggs implementation
Option 2:Calculate all the changed terms and use them to gather total docsPositives
Negatives
implementationFor each composite aggregation page:
The summation of all the total_count values from each of the queries will provide all the docs that will be queried eventually in that checkpoint Option 3Change how we do this intermittent bucket gathering stuff so that ALL changed terms are gathered before we starting querying through and index dataPositives
Negatives
implementation (unsure if this would work)
|
The more I think about the more I like Option 0. I'm just not convinced that reporting progress for the current checkpoint is needed in continuous mode: I don't see people sitting watching the progress so why go to the trouble of trying to get an accurate progress monitor. The key thing you might want to know is if the current update is taking much longer than usual. But, as suggested, we can provide that information much more easily by gathering some simple statistics for checkpoints. |
Found in 7.3.0-SNAPSHOT
{ "build" : { "hash" : "a57a5c5", "date" : "2019-07-16T14:52:05.956252Z" },
Checkpoint progress is not being updated for checkpoints 1 and above, presumably since the introduction of page-by-page processing to avoid terms explosion. Progress is reported and updated for checkpoint 0.
This is related to #43767. We may wish to revisit how we can show indicative checkpoint progress for continuous data frames. For example, 20% progress at page 2 of 10 could be a good enough indicator.
In debug logging you can see the following
The text was updated successfully, but these errors were encountered: