Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Development #1192

Merged
merged 6 commits into from
Jul 28, 2021
Merged

Development #1192

merged 6 commits into from
Jul 28, 2021

Conversation

mfeurer
Copy link
Contributor

@mfeurer mfeurer commented Jul 27, 2021

No description provided.

mfeurer and others added 5 commits July 27, 2021 14:26
Synchronize dev and master again
* Implemented `def leaderboard`

Still requires testing, only works for classification

* Fixed some bugs

* Updated function with new params

* Cleaned info gathering a little

* Identifies if classifier or regressor models

* Implemented sort_by param

* Added ranking column

* Implemented ensemble_only param for leadboard

* Implemented param top_k

* flake8'd

* Created fixtures for use with test_leaderboard

* Moved fixtures to conftest, added session scope tmp_dir

For the autoML models to be useable for the entire session without
training, they require a session scoped tmp_dir. I tried to figure out
how to make the tmp_dir more dynamic but documentation seems to imply
that the scope is set at *function definition*, not on function call.
This means either call the _tmp_dir and manually clean up or just
duplicate the tmp_dir function but aptly named for session scope. It's a
bit ugly but couldn't find an alternative.

* Can't make tmp_dir for session scope fixtures

Doesn't populate the request.module object if requesting from a session
scope. For now module will have to do

* Reverted back, models trained in test

* Moved `leaderboard` AutoML -> AutoSklearnEstimator

* Added fuzzing test for test_leaderboard

* Added tests for leaderboard, added sort_order

* Removed Type Final to support python 3.7

* Removed old solution to is_classication for leaderboard

* I should really force pre-commit to run before commit (flake8 fixes)

* More occurences of Literal

* Readded Literal but imported from typing_extensions

* Fixed docstring for sphinx

* Added make command to build html without running examples

* Added doc/examples to gitignore

Generating the sphinx examples causes output to be generated in
doc/examples. Not sure if this should be pushed considering docs/build
is not.

* Added leadboard to basic examples

Found a bug:

/home/skantify/code/auto-sklearn/examples/20_basic/example_multilabel_classification.py failed to execute correctly: Traceback (most recent call last):
  File "/home/skantify/code/auto-sklearn/examples/20_basic/example_multilabel_classification.py", line 61, in <module>
    print(automl.leaderboard())
  File "/home/skantify/code/auto-sklearn/autosklearn/estimators.py", line 738, in leaderboard
    model_runs[model_id]['ensemble_weight'] = self.automl_.ensemble_.weights_[i]
KeyError: 2

* Cleaned up _str_ of EnsembleSelection

* Fixed discrepancy between config_id and model_id

There is a discrepency between identifiers used by SMAC and and the identifiers used by an Ensemble class.
SMAC uses `config_id` which is available for every run of SMAC while Ensemble uses `model_id == num_run` which is only available in runinfo.additional_info.
However, this is not always included in additional_info, nor is additional_info garunteed to exist.
Therefore the only garunteed unique identifier for models are `config_id`s which can confuse the user if they wise to interact with the ensembler.

* Readded desired code for design choice on model indexing

There are two indexes that can be used, SMAC uses `config_id` and asklearn
uses `num_run`, these are not garunteed to be equal and also `num_run`
is not always present.

As the user should not care that there is possible 2 indexes for models,
made the choice to show `config_id` as this allows displaying info on
failed runs.

An alternative to show asklearn's `num_run` index is just to exclude any
failed runs from showing up in the leaderboard.

* Removed Literal again as typing_extensions is external module

* Switched to model_id as primary id

Any runs which do not provide a model_id == num_run are essentially
discarded. This hsould change in the future but the fix is outside the
scope of the PR.

* pre-commit flake8 fix

* Logger gives warning if sort_by is not in columns asked for

* Moved column types to static method

* Fixed rank to be based on cost

* Fixed so model_id can be requested, even though it always exists

* Fixed so rank can be calculated even if cost not requested

* Readded Literal and included typing_extension dependancy

Once Python 3.7 is dropped, we can drop typing_extensions

* Changed default sort_order to 'auto'

* Changed leaderboard columns to be static attributes

* Update budget doc

Co-authored-by: Matthias Feurer <[email protected]>

* flake8'd

Co-authored-by: Matthias Feurer <[email protected]>
* Fixes for valid parameters not being tested

* flake8'd
* Changes required to test if will work with smac@development

* Changes required to test if will work with smac@development

* Fixed failing tests with new scipy 1.7 on sparse data

* flake8'd

* Use SMAC from pypi again

* undo changes

Co-authored-by: Matthias Feurer <[email protected]>
@codecov
Copy link

codecov bot commented Jul 27, 2021

Codecov Report

Merging #1192 (96b9ad0) into master (904a692) will increase coverage by 2.24%.
The diff coverage is 93.67%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1192      +/-   ##
==========================================
+ Coverage   85.91%   88.15%   +2.24%     
==========================================
  Files         138      138              
  Lines       10790    10866      +76     
==========================================
+ Hits         9270     9579     +309     
+ Misses       1520     1287     -233     
Impacted Files Coverage Δ
autosklearn/automl.py 85.00% <ø> (ø)
autosklearn/estimators.py 93.36% <93.33%> (-0.07%) ⬇️
autosklearn/__version__.py 100.00% <100.00%> (ø)
autosklearn/ensembles/ensemble_selection.py 69.17% <100.00%> (+1.81%) ⬆️
...ipeline/components/regression/gradient_boosting.py 93.26% <0.00%> (+0.96%) ⬆️
...osklearn/pipeline/components/classification/sgd.py 96.87% <0.00%> (+1.04%) ⬆️
autosklearn/pipeline/components/regression/sgd.py 96.84% <0.00%> (+1.05%) ⬆️
...earn/pipeline/components/regression/extra_trees.py 93.75% <0.00%> (+1.25%) ⬆️
...rn/pipeline/components/regression/random_forest.py 94.44% <0.00%> (+1.38%) ⬆️
... and 14 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 904a692...96b9ad0. Read the comment docs.

@mfeurer mfeurer merged commit 3d53cd9 into master Jul 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants