Multi-objective ensemble API #1485

mfeurer · 2022-05-24T09:04:44Z

On a high-level, this PR adds

API to pass custom ensemble classes to Auto-sklearn,
API updates for use multi-objective ensembles, and
possibility to use a metric that requires access to X.

On a lower level, this PR also adds:

Updates to the ensemble building module:
- New functionality to retrieve the identifiers and weights of an ensemble.
- Because of that I was able to improve type definitions for the ensemble building submodule. More concretely, there are no more ensemble files exempt from type checking.
- pass a different seed to the ensemble builder every time it is called
- make candidate selection (n_best) aware of multiple objectives
- pass information about all available runs to the ensemble class
- available ensemble classes are now shown in the docs
- improved the docstring of ensemble selection
estimators API
- deprecate ensemble_size argument
examples
- add a pareto-front plot to the multi-objective Auto-sklearn example
tests
- add a new case for AutoML tests that checks for multi-objective optimization

codecov · 2022-05-24T09:54:48Z

Codecov Report

Merging #1485 (d8a863c) into development (4b21134) will decrease coverage by 0.41%.
The diff coverage is 75.08%.

❗ Current head d8a863c differs from pull request most recent head 0ba05e9. Consider uploading reports for the commit 0ba05e9 to get more accurate results

@@               Coverage Diff               @@
##           development    #1485      +/-   ##
===============================================
- Coverage        84.22%   83.81%   -0.42%     
===============================================
  Files              151      152       +1     
  Lines            11488    11662     +174     
  Branches          1994     2037      +43     
===============================================
+ Hits              9676     9774      +98     
- Misses            1279     1339      +60     
- Partials           533      549      +16

eddiebergman

I assume this is mostly the same as the other code I saw, just brief question on RandomState

autosklearn/ensemble_building/manager.py

autosklearn/ensembles/abstract_ensemble.py

Co-authored-by: eddiebergman <[email protected]>

eddiebergman · 2022-05-24T20:36:02Z

test/test_automl/test_post_fit.py

+    Expects
+    -------
+    * Auto-sklearn can predict and has a model
+    * _load_pareto_front returns one scikit-learn ensemble
+    """
+    # Check that the predict function works
+    X = np.array([[1.0, 1.0, 1.0, 1.0]])
+    print(automl.predict(X))
+    assert automl.predict_proba(X).shape == (1, 3)
+    assert automl.predict(X).shape == (1,)
+
+    pareto_front = automl._load_pareto_front()
+    assert len(pareto_front) == 1
+    for ensemble in pareto_front:
+        assert isinstance(ensemble, (VotingClassifier, VotingRegressor))
+        y_pred = ensemble.predict_proba(X)
+        assert y_pred.shape == (1, 3)
+        y_pred = ensemble.predict(X)
+        assert y_pred in ["setosa", "versicolor", "virginica"]


It's not very clear why this should be only one scikit learn ensemble expected but I assume it's because of the default parameter for ensemble selection.

It also seems this test is very specific to this single case (fitted multiobjective iris classifier).

I had the same problem when considering cases and my solution was just to have general tests. We can just push this through for now, knowing it will break if we add any other cases with the "multiobjective" tag.

The longer term solution to this, I have a few ideas:

We just use make_automl, make_dataset and construct the specific automl instance for this test where the specifics that are being tested are directly evident in the test. Same as old way of doing things and leads to no caching but at least all relevant setup assumptions are stated clearly in test.

We encode these extra specifics somehow:

The case just returns extra info

def case_classifier_fitted_holdout_multiobjective(...): ... return (model, extra_info)

The extra specifics are directly saved and access on the model object. This does add a lot more introspection capabilites to the model which may be helpful for future additions

Happy to hear any other ideas on this though, I admit the caching solution as is, is not perfect for this reason, but it does allow the tests to be a lot more modular.

It's not very clear why this should be only one scikit learn ensemble expected but I assume it's because of the default parameter for ensemble selection.

Correct.

It also seems this test is very specific to this single case (fitted multiobjective iris classifier).

Correct as well.

I had the same problem when considering cases and my solution was just to have general tests. We can just push this through for now, knowing it will break if we add any other cases with the "multiobjective" tag.

Very glad you see it this way.

Happy to hear any other ideas on this though

Would we for the 2nd idea check whether the AutoML was built on iris and then use it? Besides that, could we maybe add a filter on which dataset(s) were used to build the AutoML system?

Would we for the 2nd idea check whether the AutoML was built on iris and then use it? Besides that, could we maybe add a filter on which dataset(s) were used to build the AutoML system?

Yup it's definitely possible, the easiest way is to just do so in the test itself, i.e. if extra_info["dataset"] != "iris": pass but I'm not the biggest fan of the solution.

The overarching problem is that you can't use @parametrize and @tags together, i.e. you can't associate a parameter with a tag.

I guess my prefered solution is to include more general things in the extra_info or encode it on the model, meaning the tests don't have to do any filtering.

extra_info = { "X_shape": X.shape, "y_shape": y.shape, "labels": ... } return (automl, extra_info)

It's not the cleanest but at least it means this test could theoretically work for any other "multiobjective" tagged case, as long as it provides the necessary extra_info.

* Multi-objective ensemble API Co-authored-by: eddiebergman <[email protected]> * update for rebase, add loading of X_data in ensemble builder * Add unit tests * Fix unittest?, increase coverage (hopefully) * Rename methods to be Pareto set methods Co-authored-by: eddiebergman <[email protected]>

eddiebergman reviewed May 24, 2022

View reviewed changes

autosklearn/ensemble_building/manager.py Show resolved Hide resolved

autosklearn/ensembles/abstract_ensemble.py Show resolved Hide resolved

autosklearn/ensembles/abstract_ensemble.py Show resolved Hide resolved

mfeurer and others added 2 commits May 24, 2022 13:55

Multi-objective ensemble API

aa6e47a

Co-authored-by: eddiebergman <[email protected]>

update for rebase, add loading of X_data in ensemble builder

7488955

mfeurer force-pushed the moo_api branch from bca798f to 7488955 Compare May 24, 2022 12:56

mfeurer added 2 commits May 24, 2022 15:40

Add unit tests

b01f1cb

Fix unittest?, increase coverage (hopefully)

d8a863c

mfeurer requested a review from eddiebergman May 24, 2022 17:07

eddiebergman approved these changes May 24, 2022

View reviewed changes

Rename methods to be Pareto set methods

0ba05e9

mfeurer merged commit 25f0be6 into development May 30, 2022

mfeurer deleted the moo_api branch May 30, 2022 13:39

github-actions bot pushed a commit that referenced this pull request May 30, 2022

Matthias Feurer: Multi-objective ensemble API (#1485)

159cf70

eddiebergman linked an issue Jun 10, 2022 that may be closed by this pull request

[Question] Multi-objective auto-sklearn? #1317

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-objective ensemble API #1485

Multi-objective ensemble API #1485

mfeurer commented May 24, 2022 •

edited

Loading

codecov bot commented May 24, 2022 •

edited

Loading

eddiebergman left a comment

eddiebergman May 24, 2022 •

edited

Loading

mfeurer May 25, 2022

eddiebergman May 26, 2022

Multi-objective ensemble API #1485

Multi-objective ensemble API #1485

Conversation

mfeurer commented May 24, 2022 • edited Loading

codecov bot commented May 24, 2022 • edited Loading

Codecov Report

eddiebergman left a comment

Choose a reason for hiding this comment

eddiebergman May 24, 2022 • edited Loading

Choose a reason for hiding this comment

mfeurer May 25, 2022

Choose a reason for hiding this comment

eddiebergman May 26, 2022

Choose a reason for hiding this comment

mfeurer commented May 24, 2022 •

edited

Loading

codecov bot commented May 24, 2022 •

edited

Loading

eddiebergman May 24, 2022 •

edited

Loading