-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update ens builder #1434
Update ens builder #1434
Conversation
Codecov Report
@@ Coverage Diff @@
## development #1434 +/- ##
===============================================
- Coverage 84.32% 84.05% -0.28%
===============================================
Files 147 151 +4
Lines 11397 11448 +51
Branches 1986 1988 +2
===============================================
+ Hits 9611 9623 +12
- Misses 1261 1298 +37
- Partials 525 527 +2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Partial review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some more feedback. I'll check the tests later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Final part of the review.
logger_port=automl._logger_port, | ||
random_state=DEFAULT_SEED, | ||
) | ||
return manager |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just out of curiosity, where is this manager used later? I don't see right now where the test for this is located.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No where, I should have some test that use it. Sorry, slipped the todos
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just checking: did you add such a check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll do so today, there's not too much functionality to check as most of the heavy lifting is done inside EnsembleBuilder
. The real runs didn't fail so I assume it's okay, meaning I could start the benchmarking and write the tests after, almost nothing has changed. Hopefully no bugs appear in the testing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Final part of the review.
* Move ensemble_bulder test data to named folder * Update backend to take a temlate to copy from * Update tests to use new cases system * Update tests to be documented and cleaned up * Switch to using cached automl backends * Readd missing file which failed test for `case_3_models` * Seperate out tests that rely on old toy data and those that don't * Setup test framework for ensemble builder on real situations * Formatting * Remove `unit_test` arg * Remove SAVE2DISC * Split builder and manager into seperate files * Tidy up init of EnsembleBuilder * Moved to cached properties * Change List to list * Move to solely using cached properties * Add disk util file with `sizeof` * Update tests to use cached mechanism * Switch `sizeof` for disk consumption * Remove disk consumption * Remove unneeded function * Add type hints and documenation * Simplyify _read_np_fn * Update get_valid_test_preds to use Pathlib * Add intersection to functional * Make functional take *args * Further simplifications * Add a dataclass to represent run information for builder * Rename to Run * Change to Run objects * Formatting * Reduce side effects of `compute_loss_per_model` To make testing easier and changes easier, the targets are now passed to the method. This also reduces it's complexity by removing the checking from the method as we can assume the parameters coming in are correct. * Change Tuple to tuple * Forcibly add data files for tests * Fix: Can now load pickled numpy arrays w/ test * Add test for checking ensemble builder output * Fix bug with using list instead of set * Making deubgging message a little clearer * Fix typing and case name * Rename test file to reflect what it tests * Make pynisher context optional * Fix loaded models test * Updates to Run dataclass * Add method to `Run` to allow recording of last modified * Change Run mtimes to dictionary * Change `compute_loss_per_model` to use new Run dataclass * Factor out run loss into main loop * Simplyify get_nbest and compute_losses * Major rewrite of ensemble builder main loop * Change to simpler hashing * Start value split * Add `value_split` * Reworked Builder * Add some docstring * Formatting * Fix type signature * Fix typing for `loss` * Removed Literal * Mypy fixes for ensemble builder * Mypy fixes * Tests for `Runs` * Move `make_run` to fixtures * Fix run deletion * Test candidates * Made delete it's own function * Further simplifications * Fixup test with simplification * Test: `max_models` for `requires_deletion` * Test: `memory_limit` for `requires_deletion` * Test: Loss of runs * Test: Delete runs * Test: `fit_ensemble` of ensemble builder * Add test for run time parameter * Remove parameter `return_predictions` * Add note about pickled arrays should not be supported * Make cached automl instances copy backend * Add valid static method to run * Remove old test data * Add filter for bad run dirs * Made `main` args optional * Fix check for updated runs * Make `main` raise errors * Fix default value for ensemble builder `main` * Test valid ensemble with real runs * Rename parameter for manager * Add defaults and reorder parameters for EnsembleBuilderManager * Fixup parameters in `fit_and_return_ensemble` * Typing fixes * Make `fit_and_return_ensemble` a staticmethod * Add: `make_ensemble_builder_manager` * Add: Test files for manager * Add atomic rmtree * Add: atomic rmtree now accepts where mv should go * Make builder use atomic rmtree * Fix import bugs, remove valid preds in builder * Remove `np.inf` as valid arg for `read_at_most` * Possible reproducible num_run, no predictions error * Make automl caching robust to `pytest-xdist` * Test fixes * Extend interval for test on run caching * Use pickle for reseting cache * Fix test for caching mechanism to not rely on `stat` * Move run deletion to the end of the builder `main` * Remove `getattr` version of tae.client * Remove `normalize` * Extend not for `Run` * Fix `__init__` of `Run` * Parameter and comment fixes from feedback * Change to `min(...)` instead of `sorted(...)[0]` * Make default time `np.inf` * Add test for safe deletion in builder * Update docstring of `loss` for a run * Remove stray print * Minor feedback fixes * Fix `_metric` to `_metrics` * Fix `make_ensemble_builder` * One more fix for multiple metrics
Cleans up ensemble builder to make it easier to change and test.
TODO