Document list of datasets meta learning datasets #502

adithyabsk · 2018-06-26T19:58:56Z

Hello!

I was looking through the documentation and could not find the list of datasets that were used to train the meta-learning feature of auto-sklearn. The paper supplement lists a set of datasets but I was wondering if those have been updated. (http://ml.informatik.uni-freiburg.de/papers/15-NIPS-auto-sklearn-supplementary.pdf)
@mfeurer

mfeurer · 2018-06-27T08:42:59Z

They indeed have been updated, please find the list of (133) OpenML task IDs in this file: https://github.com/automl/auto-sklearn/blob/master/autosklearn/metalearning/files/accuracy_binary.classification_dense/algorithm_runs.arff

adithyabsk · 2018-06-27T13:13:02Z

Do you have a different set of datasets for the metalearning for regression problems?

adithyabsk · 2018-06-27T13:16:29Z

Also, in the folder it seems that there are different set of tasks for each metric. Was the metalearner trained upon all the set of tasks?

mfeurer · 2018-06-27T16:43:21Z

Do you have a different set of datasets for the metalearning for regression problems?

There is currently no meta-data for regression.

Also, in the folder it seems that there are different set of tasks for each metric. Was the metalearner trained upon all the set of tasks?

No, we trained Auto-sklearn with balanced accuracy for each dataset separately. Then, for each combination of metric, target problem (binary, multiclass) and data structure (dense, sparse) we looked for the legal configurations and chose the one for each dataset which performed best given the metric of interest.

adithyabsk · 2018-06-27T17:21:47Z

Then, for each combination of metric, target problem (binary, multiclass) and data structure (dense, sparse) we looked for the legal configurations and chose the one for each dataset which performed best given the metric of interest.

I'm still unclear. For example, if I specify to autosklearn to use 'f1_weighted' will it set the hyperparamters to the same ones from a dataset that is closest sourced from the below files, given that I have multiclass problems?

https://github.com/automl/auto-sklearn/blob/master/autosklearn/metalearning/files/f1_weighted_multiclass.classification_dense/algorithm_runs.arff

And assuming that that is how this works. I am also confused as to why for example the tasks seem to be the same for f1_weighted binary and f1_weighted multiclass as the tasks seem to point to multiclass problems though they are in binary as well. For example looking at task 2120 which points to dataset 182 shows a multiclass dataset yet this task is in both arffs.

https://github.com/automl/auto-sklearn/blob/master/autosklearn/metalearning/files/f1_weighted_multiclass.classification_dense/algorithm_runs.arff
https://github.com/automl/auto-sklearn/blob/master/autosklearn/metalearning/files/f1_weighted_binary.classification_dense/algorithm_runs.arff

kaiweiang · 2018-06-28T08:25:52Z

There is currently no meta-data for regression.

@mfeurer Sorry, do you mean there is no metalearning for regression currently?

mfeurer · 2018-07-02T08:54:19Z

For example, if I specify to autosklearn to use 'f1_weighted' will it set the hyperparamters to the same ones from a dataset that is closest sourced from the below files, given that I have multiclass problems?

Yes (assuming that your data is dense).

And assuming that that is how this works. I am also confused as to why for example the tasks seem to be the same for f1_weighted binary and f1_weighted multiclass as the tasks seem to point to multiclass problems though they are in binary as well.

The tasks are the same for each metric. The difference are the configurations. Configurations are selected for each combination of the target metric and the dataset type. Also, only configurations valid for a certain task are chosen.

adithyabsk · 2018-07-04T14:26:41Z

This makes a lot of sense and would be a great addition to the documentation. Thank you!

mfeurer · 2021-03-26T17:18:59Z

This question will be documented in the upcoming FAQ (#1109).

mfeurer added the documentation Something to be documented label Jun 27, 2018

kaiweiang mentioned this issue Jul 4, 2018

Regarding the cv_results_ #503

Closed

mfeurer closed this as completed Mar 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document list of datasets meta learning datasets #502

Document list of datasets meta learning datasets #502

adithyabsk commented Jun 26, 2018 •

edited

Loading

mfeurer commented Jun 27, 2018

adithyabsk commented Jun 27, 2018

adithyabsk commented Jun 27, 2018

mfeurer commented Jun 27, 2018

adithyabsk commented Jun 27, 2018 •

edited

Loading

kaiweiang commented Jun 28, 2018

mfeurer commented Jul 2, 2018

adithyabsk commented Jul 4, 2018

mfeurer commented Mar 26, 2021

Document list of datasets meta learning datasets #502

Document list of datasets meta learning datasets #502

Comments

adithyabsk commented Jun 26, 2018 • edited Loading

mfeurer commented Jun 27, 2018

adithyabsk commented Jun 27, 2018

adithyabsk commented Jun 27, 2018

mfeurer commented Jun 27, 2018

adithyabsk commented Jun 27, 2018 • edited Loading

kaiweiang commented Jun 28, 2018

mfeurer commented Jul 2, 2018

adithyabsk commented Jul 4, 2018

mfeurer commented Mar 26, 2021

adithyabsk commented Jun 26, 2018 •

edited

Loading

adithyabsk commented Jun 27, 2018 •

edited

Loading