-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve experiment stats #1038
Improve experiment stats #1038
Conversation
Do not display time info around status bar in experiments nav bar Show gray animated stripped bar when loading Show red bar on error
- Add a new property Trial.duration to compute trial duration - Compute and return only trials count per status and total trials count - Use Experiment.stats.duration as current_execution_time - Use Experiment.stats.duration to compute ETA - Compute whole clock time directly into Experiment.stats Dashboard: - Use a specific color for each trial status in progress bar - Add tooltips for bar info
Store info into Experiment.stats
- Clean-up database before new insertion - Add a start time for all non-new trials - set experiment metadata datetime - Make each completed trial duration to 2 minutes
When we click on a part of the bar, corresponding trials are filtered in trials table When we click again on same part, it is deselected, and all trials are displayed again in trials table Selected part is displayed with stripped bar
…_EXPERIMENT from experiment 2-dim-shape-exp.1 read from db_dashboard_full.pkl
- change "duration" to "elapsed time" - change "current execution time for all completed trials" to "Time elapsed since the beginning of the HPO execution"
…al execution - Rename ExperimentStats.to_dict() to to_json() - Compute experiment duration using all trials that have an execution interval - Compute ETA using only completed trials, and also in corner cases - Set max_trials to 200 for testing experiment `uncompleted_experiment`
…ted end date. We need to add `eta_milliseconds` in ExperimentStats and send it to dashboard to compute estimated end date.
…ials - Display correct ETA for each corner case - Generate an ExperimentStats object even if there are no completed trials
Remove unused file Display progress as unknown if null Add TODO
…nstead of just nb_trials If max_trials is infinite, display empty bar with label `N/A`
…ment changes. Previously, a same component was used and reloaded each time an experiment is selected. So, if we select an experiment, then another one just after without waiting for the former to load, thus the former may replace the later in the component when loading request is completed. To prevent this, we make sure a fully new component is recreated each time an experiment is selected.
PS: It seems `/experiments/:name` endpint has similar purpose to `/experiments/status/:name`. Should we merge both ?
…:experiment_name/:trial_id`
…e it work with Python 3.7
docs/src/user/web_api.rst
Outdated
"duration": "2 days, 5:11:24.006755", | ||
"whole_clock_time": "8 days, 23:15:15.594405", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to normalize these names with the labels in the frontend before we release it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, done ! duration
-> elapsed_time
and whole_clock_time
-> sum_of_trials_time
. I note that duration
was already used elsewhere in the code (e.g. in module format_terminal
). I updated everywhere.
src/orion/core/worker/experiment.py
Outdated
@@ -1,4 +1,4 @@ | |||
# pylint:disable=protected-access,too-many-public-methods,too-many-lines | |||
# pylint:disable=protected-access,too-many-public-methods,too-many-lines,too-many-branches |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this exception specific to stats
property? If yes it should be added there specifically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok!
src/orion/core/worker/experiment.py
Outdated
# If max_trials is None, 0 or infinite, we cannot compute ETA | ||
eta = None | ||
elif len(completed_trials) > self.max_trials: | ||
# If there are more completed trials than max trials, then ETA should be 0 (?) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# If there are more completed trials than max trials, then ETA should be 0 (?) | |
# If there are more completed trials than max trials, then ETA should be 0 |
Yep!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok!
NUM_COMPLETED = 3 | ||
statuses = (["completed"] * NUM_COMPLETED) + (["reserved"] * 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NUM_COMPLETED = 3 | |
statuses = (["completed"] * NUM_COMPLETED) + (["reserved"] * 2) | |
NUM_COMPLETED = 3 | |
NUM_RESERVED = 2 | |
statuses = (["completed"] * NUM_COMPLETED) + (["reserved"] * NUM_RESERVED) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same thing for tests above, setting number of reserved trials using a variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok !
assert stats.duration == datetime.timedelta(seconds=3) | ||
assert stats.whole_clock_time == datetime.timedelta(seconds=3) | ||
assert stats.nb_trials == NUM_COMPLETED + 2 | ||
assert stats.trial_status_count == {"completed": NUM_COMPLETED, "reserved": 2} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert stats.trial_status_count == {"completed": NUM_COMPLETED, "reserved": 2} | |
assert stats.trial_status_count == {"completed": NUM_COMPLETED, "reserved": NUM_RESERVED} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok !
assert stats.duration == stats.finish_time - stats.start_time | ||
assert stats.duration == datetime.timedelta(seconds=3) | ||
assert stats.whole_clock_time == datetime.timedelta(seconds=3) | ||
assert stats.nb_trials == NUM_COMPLETED + 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert stats.nb_trials == NUM_COMPLETED + 2 | |
assert stats.nb_trials == NUM_COMPLETED + NUM_RESERVED |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK !
assert stats.trial_status_count == {"completed": NUM_COMPLETED, "reserved": 2} | ||
# If max trials < completed trials, then ETA is 0, and progress is relative to nb trials | ||
assert stats.max_trials == 2 | ||
assert stats.progress == 0.6 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test should include new, broken and interrupted trials as well. The progress should not take into account the broken trials, because they will not be executed anymore, but the others yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok !
Fix comment. Add a web API test to get experiment stats when max_trials is infinite. Test that trial.execution_interval uses heartbeat if end_time is None. In test_experiment, use a variable NUM_RESERVED to set number of reserved trials.
…ime -> sum_of_trials_time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks! :)
Description
Hi @bouthilx ! This is a PR to extend experiment stats. Latest commit discards dashboard changes, so that this PR only contains backend/Python modifications.
Changes
Checklist
Tests
$ tox -e py38
; replace38
by your Python version if necessary)Documentation
Quality
$ tox -e lint
)Further comments
NB: After checking the doc, it seems API endpoint
experiments/:name
also returns various experiment info, but not all that are in stats. I wonder if we may just merge both entries in a future PR (ie./experiments/:name
and newly added/experiments/status/:name
).