memory_limit interferes with "resampling_strategy=GroupKFold" #1137

emanuele · 2021-04-24T23:11:47Z

Dear Developers,

First of all, thank you for your work and the really interesting autosklearn package.

In AutoSklearnRegressor (maybe AutoSklearnClassifier too), when memory_limit is low enough to force autosklearn to decimate the training set, a resampling strategy like GroupKFold fails because the argument groups, which is a vector of group indices for each example in the training set, is not decimated accordingly. In essence, the following line fails:

auto-sklearn/autosklearn/evaluation/train_evaluator.py

Line 994 in 275d0d6

if np.shape(self.resampling_strategy_args['groups'])[0] != y.shape[0]:

because y.shape[0] refers to the decimated training set, while np.shape(self.resampling_strategy_args['groups'])[0] refers to the original (non decimated) training set.

As a consequence, for large training sets, this problem occurs basically always, preventing to use of group-based resampling strategies.

The text was updated successfully, but these errors were encountered:

eddiebergman · 2022-02-02T17:55:44Z

Hi @emanuele,

While we havn't directly fixed this issue, you can now disable dataset compression or provide more fine tune control over how it's done. This was implemented in #1341 and documented better in #1386. It is in the development branch and will be out in our next release. I will close this for now as keeping track of indices given the multitude of possible resampling implementations seems that it would be quite a difficult task and out of scope for what we can manage.

mfeurer added the bug label May 1, 2021

eddiebergman mentioned this issue Dec 8, 2021

Allow argument to specfiy how auto-sklearn handles compressing dataset size #1341

Merged

eddiebergman linked a pull request Dec 8, 2021 that will close this issue

Allow argument to specfiy how auto-sklearn handles compressing dataset size #1341

Merged

eddiebergman closed this as completed Feb 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

memory_limit interferes with "resampling_strategy=GroupKFold" #1137

memory_limit interferes with "resampling_strategy=GroupKFold" #1137

emanuele commented Apr 24, 2021

eddiebergman commented Feb 2, 2022

memory_limit interferes with "resampling_strategy=GroupKFold" #1137

memory_limit interferes with "resampling_strategy=GroupKFold" #1137

Comments

emanuele commented Apr 24, 2021

eddiebergman commented Feb 2, 2022