Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update ens builder #1434

Merged
merged 119 commits into from
May 13, 2022
Merged
Show file tree
Hide file tree
Changes from 103 commits
Commits
Show all changes
119 commits
Select commit Hold shift + click to select a range
092985d
Move ensemble_bulder test data to named folder
eddiebergman Mar 26, 2022
83db9cf
Update backend to take a temlate to copy from
eddiebergman Mar 26, 2022
dc4585e
Update tests to use new cases system
eddiebergman Mar 26, 2022
f28c3e4
Update tests to be documented and cleaned up
eddiebergman Mar 27, 2022
9613312
Switch to using cached automl backends
eddiebergman Mar 27, 2022
a20150c
Readd missing file which failed test for `case_3_models`
eddiebergman Mar 27, 2022
84d01e7
Seperate out tests that rely on old toy data and those that don't
eddiebergman Mar 27, 2022
fcf6ad0
Setup test framework for ensemble builder on real situations
eddiebergman Mar 27, 2022
951bb2e
Formatting
eddiebergman Mar 27, 2022
5abf258
Remove `unit_test` arg
eddiebergman Mar 27, 2022
3e8ed92
Remove SAVE2DISC
eddiebergman Mar 27, 2022
5dd9832
Split builder and manager into seperate files
eddiebergman Mar 27, 2022
c0ebad5
Tidy up init of EnsembleBuilder
eddiebergman Mar 27, 2022
07d2c55
Moved to cached properties
eddiebergman Mar 27, 2022
8ac8ffe
Change List to list
eddiebergman Mar 27, 2022
6472714
Move to solely using cached properties
eddiebergman Mar 27, 2022
36d7dd6
Add disk util file with `sizeof`
eddiebergman Mar 27, 2022
5c9842f
Update tests to use cached mechanism
eddiebergman Mar 27, 2022
1de376c
Switch `sizeof` for disk consumption
eddiebergman Mar 27, 2022
23de0fb
Remove disk consumption
eddiebergman Mar 27, 2022
e34100d
Remove unneeded function
eddiebergman Mar 27, 2022
2d90370
Add type hints and documenation
eddiebergman Mar 27, 2022
1fe4c61
Simplyify _read_np_fn
eddiebergman Mar 27, 2022
facbd7f
Update get_valid_test_preds to use Pathlib
eddiebergman Mar 27, 2022
9e6169b
Add intersection to functional
eddiebergman Mar 27, 2022
ebb2c78
Make functional take *args
eddiebergman Mar 27, 2022
d0f0980
Further simplifications
eddiebergman Mar 27, 2022
9903b74
Add a dataclass to represent run information for builder
eddiebergman Mar 29, 2022
3ff5873
Rename to Run
eddiebergman Mar 29, 2022
99399b6
Change to Run objects
eddiebergman Mar 29, 2022
c7c77c0
Formatting
eddiebergman Mar 29, 2022
a7dee5e
Reduce side effects of `compute_loss_per_model`
eddiebergman Mar 29, 2022
b6c3e90
Change Tuple to tuple
eddiebergman Mar 30, 2022
45c94e0
Forcibly add data files for tests
eddiebergman Mar 30, 2022
9dee1e8
Fix: Can now load pickled numpy arrays w/ test
eddiebergman Apr 1, 2022
f403c8d
Add test for checking ensemble builder output
eddiebergman Apr 1, 2022
39f7f83
Fix bug with using list instead of set
eddiebergman Apr 1, 2022
b39db86
Making deubgging message a little clearer
eddiebergman Apr 1, 2022
40e64b0
Fix typing and case name
eddiebergman Apr 1, 2022
35f92c2
Rename test file to reflect what it tests
eddiebergman Apr 1, 2022
fe24b8c
Make pynisher context optional
eddiebergman Apr 1, 2022
c2b19e1
Fix loaded models test
eddiebergman Apr 1, 2022
c644502
Updates to Run dataclass
eddiebergman Apr 1, 2022
0d459ef
Add method to `Run` to allow recording of last modified
eddiebergman Apr 1, 2022
2923946
Change Run mtimes to dictionary
eddiebergman Apr 1, 2022
2ea36c1
Change `compute_loss_per_model` to use new Run dataclass
eddiebergman Apr 1, 2022
3dacc39
Factor out run loss into main loop
eddiebergman Apr 1, 2022
2b5e479
Simplyify get_nbest and compute_losses
eddiebergman Apr 1, 2022
4e1222b
Major rewrite of ensemble builder main loop
eddiebergman Apr 3, 2022
881ecef
Change to simpler hashing
eddiebergman Apr 3, 2022
ee9fdef
Start value split
eddiebergman Apr 3, 2022
ba75c2c
Add `value_split`
eddiebergman Apr 4, 2022
be44195
Reworked Builder
eddiebergman Apr 8, 2022
5b8271c
Add some docstring
eddiebergman Apr 8, 2022
c1496ce
Formatting
eddiebergman Apr 8, 2022
04c8b93
Fix type signature
eddiebergman Apr 8, 2022
dc09d96
Fix typing for `loss`
eddiebergman Apr 8, 2022
7f6b7d9
Removed Literal
eddiebergman Apr 8, 2022
f45e409
Mypy fixes for ensemble builder
eddiebergman Apr 8, 2022
fa6146b
Mypy fixes
eddiebergman Apr 8, 2022
41711c2
Tests for `Runs`
eddiebergman Apr 8, 2022
84b9618
Move `make_run` to fixtures
eddiebergman Apr 8, 2022
4dafa0d
Fix run deletion
eddiebergman Apr 9, 2022
db322f3
Test candidates
eddiebergman Apr 10, 2022
f8b5b35
Made delete it's own function
eddiebergman Apr 10, 2022
f126251
Further simplifications
eddiebergman Apr 10, 2022
2e116e0
Fixup test with simplification
eddiebergman Apr 10, 2022
b900fa4
Test: `max_models` for `requires_deletion`
eddiebergman Apr 18, 2022
6f37f39
Test: `memory_limit` for `requires_deletion`
eddiebergman Apr 18, 2022
ec9b946
Test: Loss of runs
eddiebergman Apr 18, 2022
88834e1
Test: Delete runs
eddiebergman Apr 18, 2022
e07adb6
Test: `fit_ensemble` of ensemble builder
eddiebergman Apr 19, 2022
2e0ccc5
Add test for run time parameter
eddiebergman Apr 19, 2022
44fa3e8
Remove parameter `return_predictions`
eddiebergman Apr 19, 2022
3cf4bcc
Add note about pickled arrays should not be supported
eddiebergman Apr 19, 2022
842e393
Make cached automl instances copy backend
eddiebergman Apr 19, 2022
3083628
Add valid static method to run
eddiebergman Apr 19, 2022
195ed70
Remove old test data
eddiebergman Apr 19, 2022
86d298a
Add filter for bad run dirs
eddiebergman Apr 19, 2022
1c1828e
Made `main` args optional
eddiebergman Apr 19, 2022
ddace9d
Fix check for updated runs
eddiebergman Apr 19, 2022
8a393ea
Make `main` raise errors
eddiebergman Apr 19, 2022
c0ed290
Fix default value for ensemble builder `main`
eddiebergman Apr 19, 2022
94f869d
Test valid ensemble with real runs
eddiebergman Apr 19, 2022
4725ba6
Rename parameter for manager
eddiebergman Apr 19, 2022
3af17e1
Add defaults and reorder parameters for EnsembleBuilderManager
eddiebergman Apr 19, 2022
fa55d15
Fixup parameters in `fit_and_return_ensemble`
eddiebergman Apr 19, 2022
2115f0c
Typing fixes
eddiebergman Apr 19, 2022
050d9a4
Make `fit_and_return_ensemble` a staticmethod
eddiebergman Apr 19, 2022
d3da909
Add: `make_ensemble_builder_manager`
eddiebergman Apr 19, 2022
d65f1ce
Add: Test files for manager
eddiebergman Apr 19, 2022
7aced10
Add atomic rmtree
eddiebergman Apr 19, 2022
d938b00
Add: atomic rmtree now accepts where mv should go
eddiebergman Apr 19, 2022
449a67a
Make builder use atomic rmtree
eddiebergman Apr 19, 2022
35db618
Merge branch 'development' into update_ens_builder
eddiebergman Apr 24, 2022
73978a0
Fix import bugs, remove valid preds in builder
eddiebergman May 2, 2022
bf0e2db
Remove `np.inf` as valid arg for `read_at_most`
eddiebergman May 2, 2022
26e9d49
Possible reproducible num_run, no predictions error
eddiebergman May 4, 2022
cb723b9
Make automl caching robust to `pytest-xdist`
eddiebergman May 5, 2022
cb45a35
Test fixes
eddiebergman May 5, 2022
cfd45f6
Extend interval for test on run caching
eddiebergman May 6, 2022
cc45300
Use pickle for reseting cache
eddiebergman May 6, 2022
65ec881
Fix test for caching mechanism to not rely on `stat`
eddiebergman May 6, 2022
3c218e4
Move run deletion to the end of the builder `main`
eddiebergman May 11, 2022
0fc809e
Remove `getattr` version of tae.client
eddiebergman May 11, 2022
b175bb0
Remove `normalize`
eddiebergman May 11, 2022
2bd0c01
Extend not for `Run`
eddiebergman May 11, 2022
25defe8
Fix `__init__` of `Run`
eddiebergman May 11, 2022
82c68f0
Parameter and comment fixes from feedback
eddiebergman May 11, 2022
ef7848f
Change to `min(...)` instead of `sorted(...)[0]`
eddiebergman May 11, 2022
c990e60
Make default time `np.inf`
eddiebergman May 11, 2022
6476856
Add test for safe deletion in builder
eddiebergman May 11, 2022
936fba5
Update docstring of `loss` for a run
eddiebergman May 11, 2022
8695049
Remove stray print
eddiebergman May 11, 2022
c2111c2
Minor feedback fixes
eddiebergman May 11, 2022
a515016
Merge branch 'development' into update_ens_builder
eddiebergman May 13, 2022
b326cc9
Fix `_metric` to `_metrics`
eddiebergman May 13, 2022
4e4ea64
Fix `make_ensemble_builder`
eddiebergman May 13, 2022
92f59c2
One more fix for multiple metrics
eddiebergman May 13, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 21 additions & 10 deletions autosklearn/automl.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@
convert_if_sparse,
)
from autosklearn.data.xy_data_manager import XYDataManager
from autosklearn.ensemble_builder import EnsembleBuilderManager
from autosklearn.ensemble_building import EnsembleBuilderManager
from autosklearn.ensembles.singlebest_ensemble import SingleBest
from autosklearn.evaluation import ExecuteTaFuncWithQueue, get_cost_of_crash
from autosklearn.evaluation.abstract_evaluator import _fit_and_suppress_warnings
Expand Down Expand Up @@ -297,6 +297,8 @@ def __init__(
self._label_num = None
self._parser = None
self._can_predict = False
self._read_at_most = None
self._max_ensemble_build_iterations = None
self.models_: Optional[dict] = None
self.cv_models_: Optional[dict] = None
self.ensemble_ = None
Expand Down Expand Up @@ -796,9 +798,9 @@ def fit(
max_models_on_disc=self._max_models_on_disc,
seed=self._seed,
precision=self.precision,
max_iterations=None,
read_at_most=np.inf,
ensemble_memory_limit=self._memory_limit,
max_iterations=self._max_ensemble_build_iterations,
read_at_most=self._read_at_most,
memory_limit=self._memory_limit,
random_state=self._seed,
logger_port=self._logger_port,
pynisher_context=self._multiprocessing_context,
Expand Down Expand Up @@ -911,7 +913,7 @@ def fit(
)
result = proc_ensemble.futures.pop().result()
if result:
ensemble_history, _, _, _, _ = result
ensemble_history, _ = result
self.ensemble_performance_history.extend(ensemble_history)
self._logger.info("Ensemble script finished, continue shutdown.")

Expand Down Expand Up @@ -1499,8 +1501,8 @@ def fit_ensemble(
seed=self._seed,
precision=precision if precision else self.precision,
max_iterations=1,
read_at_most=np.inf,
ensemble_memory_limit=self._memory_limit,
read_at_most=None,
memory_limit=self._memory_limit,
random_state=self._seed,
logger_port=self._logger_port,
pynisher_context=self._multiprocessing_context,
Expand All @@ -1513,7 +1515,7 @@ def fit_ensemble(
"Error building the ensemble - please check the log file and command "
"line output for error messages."
)
self.ensemble_performance_history, _, _, _, _ = result
self.ensemble_performance_history, _ = result
self._ensemble_size = ensemble_size

self._load_models()
Expand Down Expand Up @@ -2048,6 +2050,15 @@ def has_key(rv, key):

return ensemble_dict

def has_ensemble(self) -> bool:
"""
Returns
-------
bool
Whether this AutoML instance has an ensemble
"""
return self.ensemble_ is not None

def _create_search_space(
self,
tmp_dir: str,
Expand Down Expand Up @@ -2106,7 +2117,7 @@ def fit(
y: SUPPORTED_TARGET_TYPES | spmatrix,
X_test: Optional[SUPPORTED_FEAT_TYPES] = None,
y_test: Optional[SUPPORTED_TARGET_TYPES | spmatrix] = None,
feat_type: Optional[list[bool]] = None,
feat_type: Optional[list[str]] = None,
dataset_name: Optional[str] = None,
only_return_configuration_space: bool = False,
load_models: bool = True,
Expand Down Expand Up @@ -2196,7 +2207,7 @@ def fit(
y: SUPPORTED_TARGET_TYPES | spmatrix,
X_test: Optional[SUPPORTED_FEAT_TYPES] = None,
y_test: Optional[SUPPORTED_TARGET_TYPES | spmatrix] = None,
feat_type: Optional[list[bool]] = None,
feat_type: Optional[list[str]] = None,
dataset_name: Optional[str] = None,
only_return_configuration_space: bool = False,
load_models: bool = True,
Expand Down
Loading