Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

memory_limit interferes with "resampling_strategy=GroupKFold" #1137

Closed
emanuele opened this issue Apr 24, 2021 · 1 comment · Fixed by #1341
Closed

memory_limit interferes with "resampling_strategy=GroupKFold" #1137

emanuele opened this issue Apr 24, 2021 · 1 comment · Fixed by #1341
Labels

Comments

@emanuele
Copy link

Dear Developers,

First of all, thank you for your work and the really interesting autosklearn package.

In AutoSklearnRegressor (maybe AutoSklearnClassifier too), when memory_limit is low enough to force autosklearn to decimate the training set, a resampling strategy like GroupKFold fails because the argument groups, which is a vector of group indices for each example in the training set, is not decimated accordingly. In essence, the following line fails:

if np.shape(self.resampling_strategy_args['groups'])[0] != y.shape[0]:

because y.shape[0] refers to the decimated training set, while np.shape(self.resampling_strategy_args['groups'])[0] refers to the original (non decimated) training set.

As a consequence, for large training sets, this problem occurs basically always, preventing to use of group-based resampling strategies.

@eddiebergman
Copy link
Contributor

Hi @emanuele,

While we havn't directly fixed this issue, you can now disable dataset compression or provide more fine tune control over how it's done. This was implemented in #1341 and documented better in #1386. It is in the development branch and will be out in our next release. I will close this for now as keeping track of indices given the multitude of possible resampling implementations seems that it would be quite a difficult task and out of scope for what we can manage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants