-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi-objective ensemble API #1485
Conversation
Codecov Report
@@ Coverage Diff @@
## development #1485 +/- ##
===============================================
- Coverage 84.22% 83.81% -0.42%
===============================================
Files 151 152 +1
Lines 11488 11662 +174
Branches 1994 2037 +43
===============================================
+ Hits 9676 9774 +98
- Misses 1279 1339 +60
- Partials 533 549 +16 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this is mostly the same as the other code I saw, just brief question on RandomState
Co-authored-by: eddiebergman <[email protected]>
test/test_automl/test_post_fit.py
Outdated
Expects | ||
------- | ||
* Auto-sklearn can predict and has a model | ||
* _load_pareto_front returns one scikit-learn ensemble | ||
""" | ||
# Check that the predict function works | ||
X = np.array([[1.0, 1.0, 1.0, 1.0]]) | ||
print(automl.predict(X)) | ||
assert automl.predict_proba(X).shape == (1, 3) | ||
assert automl.predict(X).shape == (1,) | ||
|
||
pareto_front = automl._load_pareto_front() | ||
assert len(pareto_front) == 1 | ||
for ensemble in pareto_front: | ||
assert isinstance(ensemble, (VotingClassifier, VotingRegressor)) | ||
y_pred = ensemble.predict_proba(X) | ||
assert y_pred.shape == (1, 3) | ||
y_pred = ensemble.predict(X) | ||
assert y_pred in ["setosa", "versicolor", "virginica"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not very clear why this should be only one scikit learn ensemble expected but I assume it's because of the default parameter for ensemble selection.
It also seems this test is very specific to this single case (fitted multiobjective iris classifier).
I had the same problem when considering cases and my solution was just to have general tests. We can just push this through for now, knowing it will break if we add any other cases with the "multiobjective"
tag.
The longer term solution to this, I have a few ideas:
- We just use
make_automl
,make_dataset
and construct the specific automl instance for this test where the specifics that are being tested are directly evident in the test. Same as old way of doing things and leads to no caching but at least all relevant setup assumptions are stated clearly in test. - We encode these extra specifics somehow:
- The case just returns extra info
def case_classifier_fitted_holdout_multiobjective(...): ... return (model, extra_info)
- The extra specifics are directly saved and access on the model object. This does add a lot more introspection capabilites to the model which may be helpful for future additions
Happy to hear any other ideas on this though, I admit the caching solution as is, is not perfect for this reason, but it does allow the tests to be a lot more modular.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not very clear why this should be only one scikit learn ensemble expected but I assume it's because of the default parameter for ensemble selection.
Correct.
It also seems this test is very specific to this single case (fitted multiobjective iris classifier).
Correct as well.
I had the same problem when considering cases and my solution was just to have general tests. We can just push this through for now, knowing it will break if we add any other cases with the "multiobjective" tag.
Very glad you see it this way.
Happy to hear any other ideas on this though
Would we for the 2nd idea check whether the AutoML was built on iris and then use it? Besides that, could we maybe add a filter on which dataset(s) were used to build the AutoML system?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would we for the 2nd idea check whether the AutoML was built on iris and then use it? Besides that, could we maybe add a filter on which dataset(s) were used to build the AutoML system?
Yup it's definitely possible, the easiest way is to just do so in the test itself, i.e. if extra_info["dataset"] != "iris": pass
but I'm not the biggest fan of the solution.
The overarching problem is that you can't use @parametrize
and @tags
together, i.e. you can't associate a parameter with a tag.
I guess my prefered solution is to include more general things in the extra_info
or encode it on the model, meaning the tests don't have to do any filtering.
extra_info = {
"X_shape": X.shape,
"y_shape": y.shape,
"labels": ...
}
return (automl, extra_info)
It's not the cleanest but at least it means this test could theoretically work for any other "multiobjective" tagged case, as long as it provides the necessary extra_info.
* Multi-objective ensemble API Co-authored-by: eddiebergman <[email protected]> * update for rebase, add loading of X_data in ensemble builder * Add unit tests * Fix unittest?, increase coverage (hopefully) * Rename methods to be Pareto set methods Co-authored-by: eddiebergman <[email protected]>
On a high-level, this PR adds
X
.On a lower level, this PR also adds:
n_best
) aware of multiple objectivesensemble_size
argument