Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document list of datasets meta learning datasets #502

Closed
adithyabsk opened this issue Jun 26, 2018 · 9 comments
Closed

Document list of datasets meta learning datasets #502

adithyabsk opened this issue Jun 26, 2018 · 9 comments
Labels
documentation Something to be documented

Comments

@adithyabsk
Copy link

adithyabsk commented Jun 26, 2018

Hello!

I was looking through the documentation and could not find the list of datasets that were used to train the meta-learning feature of auto-sklearn. The paper supplement lists a set of datasets but I was wondering if those have been updated. (http://ml.informatik.uni-freiburg.de/papers/15-NIPS-auto-sklearn-supplementary.pdf)
@mfeurer

@mfeurer mfeurer added the documentation Something to be documented label Jun 27, 2018
@mfeurer
Copy link
Contributor

mfeurer commented Jun 27, 2018

They indeed have been updated, please find the list of (133) OpenML task IDs in this file: https://github.com/automl/auto-sklearn/blob/master/autosklearn/metalearning/files/accuracy_binary.classification_dense/algorithm_runs.arff

@adithyabsk
Copy link
Author

Do you have a different set of datasets for the metalearning for regression problems?

@adithyabsk
Copy link
Author

Also, in the folder it seems that there are different set of tasks for each metric. Was the metalearner trained upon all the set of tasks?

@mfeurer
Copy link
Contributor

mfeurer commented Jun 27, 2018

Do you have a different set of datasets for the metalearning for regression problems?

There is currently no meta-data for regression.

Also, in the folder it seems that there are different set of tasks for each metric. Was the metalearner trained upon all the set of tasks?

No, we trained Auto-sklearn with balanced accuracy for each dataset separately. Then, for each combination of metric, target problem (binary, multiclass) and data structure (dense, sparse) we looked for the legal configurations and chose the one for each dataset which performed best given the metric of interest.

@adithyabsk
Copy link
Author

adithyabsk commented Jun 27, 2018

Then, for each combination of metric, target problem (binary, multiclass) and data structure (dense, sparse) we looked for the legal configurations and chose the one for each dataset which performed best given the metric of interest.

I'm still unclear. For example, if I specify to autosklearn to use 'f1_weighted' will it set the hyperparamters to the same ones from a dataset that is closest sourced from the below files, given that I have multiclass problems?

https://github.com/automl/auto-sklearn/blob/master/autosklearn/metalearning/files/f1_weighted_multiclass.classification_dense/algorithm_runs.arff

And assuming that that is how this works. I am also confused as to why for example the tasks seem to be the same for f1_weighted binary and f1_weighted multiclass as the tasks seem to point to multiclass problems though they are in binary as well. For example looking at task 2120 which points to dataset 182 shows a multiclass dataset yet this task is in both arffs.

https://github.com/automl/auto-sklearn/blob/master/autosklearn/metalearning/files/f1_weighted_multiclass.classification_dense/algorithm_runs.arff
https://github.com/automl/auto-sklearn/blob/master/autosklearn/metalearning/files/f1_weighted_binary.classification_dense/algorithm_runs.arff

@kaiweiang
Copy link

There is currently no meta-data for regression.

@mfeurer Sorry, do you mean there is no metalearning for regression currently?

@mfeurer
Copy link
Contributor

mfeurer commented Jul 2, 2018

For example, if I specify to autosklearn to use 'f1_weighted' will it set the hyperparamters to the same ones from a dataset that is closest sourced from the below files, given that I have multiclass problems?

Yes (assuming that your data is dense).

And assuming that that is how this works. I am also confused as to why for example the tasks seem to be the same for f1_weighted binary and f1_weighted multiclass as the tasks seem to point to multiclass problems though they are in binary as well.

The tasks are the same for each metric. The difference are the configurations. Configurations are selected for each combination of the target metric and the dataset type. Also, only configurations valid for a certain task are chosen.

@adithyabsk
Copy link
Author

This makes a lot of sense and would be a great addition to the documentation. Thank you!

@mfeurer
Copy link
Contributor

mfeurer commented Mar 26, 2021

This question will be documented in the upcoming FAQ (#1109).

@mfeurer mfeurer closed this as completed Mar 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Something to be documented
Projects
None yet
Development

No branches or pull requests

3 participants