Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Leaderboard #1185

Merged
merged 42 commits into from
Jul 27, 2021
Merged

Leaderboard #1185

merged 42 commits into from
Jul 27, 2021

Conversation

eddiebergman
Copy link
Contributor

@eddiebergman eddiebergman commented Jul 27, 2021

Leaderboard functionality

  • Show's each model's performance on the training set as optimized by SMAC

Main Changes

  • Add's a function to the AutoSklearnEstimator class called leaderboard with signature
def leaderboard(
    self,
    detailed: bool = False,
    ensemble_only: bool = True,
    top_k: Union[int, Literal['all']] = 'all',
    sort_by: str = 'cost',
    sort_order: Literal['auto', 'ascending', 'descending'] = 'auto',
    include: Optional[Union[str, Iterable[str]]] = None
) -> pd.DataFrame:

Note

This is a clean branch based off the PR at (#1177) due to git difing issues once merged with an updated development branch.

Still requires testing, only works for classification
For the autoML models to be useable for the entire session without
training, they require a session scoped tmp_dir. I tried to figure out
how to make the tmp_dir more dynamic but documentation seems to imply
that the scope is set at *function definition*, not on function call.
This means either call the _tmp_dir and manually clean up or just
duplicate the tmp_dir function but aptly named for session scope. It's a
bit ugly but couldn't find an alternative.
Doesn't populate the request.module object if requesting from a session
scope. For now module will have to do
Generating the sphinx examples causes output to be generated in
doc/examples. Not sure if this should be pushed considering docs/build
is not.
Found a bug:

/home/skantify/code/auto-sklearn/examples/20_basic/example_multilabel_classification.py failed to execute correctly: Traceback (most recent call last):
  File "/home/skantify/code/auto-sklearn/examples/20_basic/example_multilabel_classification.py", line 61, in <module>
    print(automl.leaderboard())
  File "/home/skantify/code/auto-sklearn/autosklearn/estimators.py", line 738, in leaderboard
    model_runs[model_id]['ensemble_weight'] = self.automl_.ensemble_.weights_[i]
KeyError: 2
There is a discrepency between identifiers used by SMAC and and the identifiers used by an Ensemble class.
SMAC uses `config_id` which is available for every run of SMAC while Ensemble uses `model_id == num_run` which is only available in runinfo.additional_info.
However, this is not always included in additional_info, nor is additional_info garunteed to exist.
Therefore the only garunteed unique identifier for models are `config_id`s which can confuse the user if they wise to interact with the ensembler.
There are two indexes that can be used, SMAC uses `config_id` and asklearn
uses `num_run`, these are not garunteed to be equal and also `num_run`
is not always present.

As the user should not care that there is possible 2 indexes for models,
made the choice to show `config_id` as this allows displaying info on
failed runs.

An alternative to show asklearn's `num_run` index is just to exclude any
failed runs from showing up in the leaderboard.
@eddiebergman eddiebergman marked this pull request as ready for review July 27, 2021 14:27
@codecov
Copy link

codecov bot commented Jul 27, 2021

Codecov Report

Merging #1185 (f36eec4) into development (611cf5c) will decrease coverage by 0.24%.
The diff coverage is 36.23%.

Impacted file tree graph

@@               Coverage Diff               @@
##           development    #1185      +/-   ##
===============================================
- Coverage        85.86%   85.62%   -0.25%     
===============================================
  Files              138      138              
  Lines            10790    10857      +67     
===============================================
+ Hits              9265     9296      +31     
- Misses            1525     1561      +36     
Impacted Files Coverage Δ
autosklearn/automl.py 85.00% <ø> (ø)
autosklearn/estimators.py 73.76% <33.33%> (-19.67%) ⬇️
autosklearn/ensembles/ensemble_selection.py 67.80% <100.00%> (+0.44%) ⬆️
...ine/components/classification/gradient_boosting.py 91.30% <0.00%> (-0.87%) ⬇️
autosklearn/ensemble_builder.py 77.17% <0.00%> (+0.40%) ⬆️
..._preprocessing/select_percentile_classification.py 89.65% <0.00%> (+1.72%) ⬆️
...ature_preprocessing/select_rates_classification.py 87.32% <0.00%> (+4.22%) ⬆️
...eline/components/feature_preprocessing/fast_ica.py 97.82% <0.00%> (+6.52%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 611cf5c...f36eec4. Read the comment docs.

@mfeurer mfeurer merged commit 6231b1c into automl:development Jul 27, 2021
github-actions bot pushed a commit that referenced this pull request Jul 27, 2021
@eddiebergman eddiebergman deleted the leaderboard_fresh branch July 28, 2021 08:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants