Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve experiment stats #1038

Merged
merged 43 commits into from
Dec 19, 2022

Conversation

notoraptor
Copy link
Collaborator

Description

Hi @bouthilx ! This is a PR to extend experiment stats. Latest commit discards dashboard changes, so that this PR only contains backend/Python modifications.

Changes

  • Add new entries to experiment stats
  • Add new entry to web API to get experiment stats.
  • Return trial status in trial web API endpoint

Checklist

Tests

  • I added corresponding tests for bug fixes and new features. If possible, the tests fail without the changes
  • All new and existing tests are passing ($ tox -e py38; replace 38 by your Python version if necessary)

Documentation

  • I have updated the relevant documentation related to my changes

Quality

  • I have read the CONTRIBUTING doc
  • My commits messages follow this format
  • My code follows the style guidelines ($ tox -e lint)

Further comments

NB: After checking the doc, it seems API endpoint experiments/:name also returns various experiment info, but not all that are in stats. I wonder if we may just merge both entries in a future PR (ie. /experiments/:name and newly added /experiments/status/:name).

Do not display time info around status bar in experiments nav bar
Show gray animated stripped bar when loading
Show red bar on error
- Add a new property Trial.duration to compute trial duration
- Compute and return only trials count per status and total trials count
- Use Experiment.stats.duration as current_execution_time
- Use Experiment.stats.duration to compute ETA
- Compute whole clock time directly into Experiment.stats
Dashboard:
- Use a specific color for each trial status in progress bar
- Add tooltips for bar info
Store info into Experiment.stats
- Clean-up database before new insertion
- Add a start time for all non-new trials
- set experiment metadata datetime
- Make each completed trial duration to 2 minutes
When we click on a part of the bar, corresponding trials are filtered in trials table
When we click again on same part, it is deselected, and all trials are displayed again in trials table
Selected part is displayed with stripped bar
…_EXPERIMENT from experiment 2-dim-shape-exp.1 read from db_dashboard_full.pkl
- change "duration" to "elapsed time"
- change "current execution time for all completed trials" to "Time elapsed since the beginning of the HPO execution"
…al execution

- Rename ExperimentStats.to_dict() to to_json()
- Compute experiment duration using all trials that have an execution interval
- Compute ETA using only completed trials, and also in corner cases
- Set max_trials to 200 for testing experiment `uncompleted_experiment`
…ted end date.

We need to add `eta_milliseconds` in ExperimentStats and send it to dashboard to compute estimated end date.
…ials

- Display correct ETA for each corner case
- Generate an ExperimentStats object even if there are no completed trials
Remove unused file
Display progress as unknown if null
Add TODO
…nstead of just nb_trials

If max_trials is infinite, display empty bar with label `N/A`
…ment changes.

Previously, a same component was used and reloaded each time an experiment is selected.
So, if we select an experiment, then another one just after without waiting for the former to load, thus the former may replace the later in the component when loading request is completed.

To prevent this, we make sure a fully new component is recreated each time an experiment is selected.
PS: It seems `/experiments/:name` endpint has similar purpose to `/experiments/status/:name`. Should we merge both ?
Comment on lines 189 to 190
"duration": "2 days, 5:11:24.006755",
"whole_clock_time": "8 days, 23:15:15.594405",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to normalize these names with the labels in the frontend before we release it.

Copy link
Collaborator Author

@notoraptor notoraptor Dec 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, done ! duration -> elapsed_time and whole_clock_time -> sum_of_trials_time. I note that duration was already used elsewhere in the code (e.g. in module format_terminal). I updated everywhere.

@@ -1,4 +1,4 @@
# pylint:disable=protected-access,too-many-public-methods,too-many-lines
# pylint:disable=protected-access,too-many-public-methods,too-many-lines,too-many-branches
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this exception specific to stats property? If yes it should be added there specifically.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok!

# If max_trials is None, 0 or infinite, we cannot compute ETA
eta = None
elif len(completed_trials) > self.max_trials:
# If there are more completed trials than max trials, then ETA should be 0 (?)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# If there are more completed trials than max trials, then ETA should be 0 (?)
# If there are more completed trials than max trials, then ETA should be 0

Yep!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok!

Comment on lines 744 to 745
NUM_COMPLETED = 3
statuses = (["completed"] * NUM_COMPLETED) + (["reserved"] * 2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
NUM_COMPLETED = 3
statuses = (["completed"] * NUM_COMPLETED) + (["reserved"] * 2)
NUM_COMPLETED = 3
NUM_RESERVED = 2
statuses = (["completed"] * NUM_COMPLETED) + (["reserved"] * NUM_RESERVED)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing for tests above, setting number of reserved trials using a variable.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok !

assert stats.duration == datetime.timedelta(seconds=3)
assert stats.whole_clock_time == datetime.timedelta(seconds=3)
assert stats.nb_trials == NUM_COMPLETED + 2
assert stats.trial_status_count == {"completed": NUM_COMPLETED, "reserved": 2}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert stats.trial_status_count == {"completed": NUM_COMPLETED, "reserved": 2}
assert stats.trial_status_count == {"completed": NUM_COMPLETED, "reserved": NUM_RESERVED}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok !

assert stats.duration == stats.finish_time - stats.start_time
assert stats.duration == datetime.timedelta(seconds=3)
assert stats.whole_clock_time == datetime.timedelta(seconds=3)
assert stats.nb_trials == NUM_COMPLETED + 2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert stats.nb_trials == NUM_COMPLETED + 2
assert stats.nb_trials == NUM_COMPLETED + NUM_RESERVED

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK !

assert stats.trial_status_count == {"completed": NUM_COMPLETED, "reserved": 2}
# If max trials < completed trials, then ETA is 0, and progress is relative to nb trials
assert stats.max_trials == 2
assert stats.progress == 0.6
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test should include new, broken and interrupted trials as well. The progress should not take into account the broken trials, because they will not be executed anymore, but the others yes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok !

Fix comment.
Add a web API test to get experiment stats when max_trials is infinite.
Test that trial.execution_interval uses heartbeat if end_time is None.
In test_experiment, use a variable NUM_RESERVED to set number of reserved trials.
Copy link
Member

@bouthilx bouthilx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks! :)

@bouthilx bouthilx added the enhancement Improves a feature or non-functional aspects (e.g., optimization, prettify, technical debt) label Dec 19, 2022
@bouthilx bouthilx merged commit f6ba53b into Epistimio:develop Dec 19, 2022
@notoraptor notoraptor deleted the experiment-progress-bar-backend branch January 19, 2023 17:34
@notoraptor notoraptor mentioned this pull request Mar 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improves a feature or non-functional aspects (e.g., optimization, prettify, technical debt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants