Export asklearn2 predictions to improve start up time #1364

eddiebergman · 2022-01-10T10:54:07Z

We currently fit and predict upon loading autosklearn.experimental.askl2 for the first time. In environments with a non-persistent filesystem (autosklearn is installed into a new filesystem each time), this can add quite a bit of time delay as experienced in #1362

It seems more applicable to export the predictions with the library to save on this time.

The text was updated successfully, but these errors were encountered:

aseemk98 · 2022-05-06T17:38:20Z

Hi @eddiebergman. Would love to take this issue up as a first-timer. Is this still open?

eddiebergman · 2022-05-07T06:29:05Z

Hi @aseemk98,

So we discussed this and we're not sure we have a good solution in mind but if you have ideas, we would be happy to discuss them! Essentially there is some model trained for each metric upon import of autosklearn.experimental.askl2. There's two routes I can see to go from here.

Export the model with the code. This has a few downsides, firstly the size of the model may be large (needs to be tested) and secondly, from a maintenance perspective, this means changing our CI system to make sure these models are built before we push anything to PyPi.
We train them if needed on a call to either fit or __init__. It seems we do this for each of the valid metrics at the moment, regardless if it's used or not. The overhead would still exist with this solution but only the overhead that's actually needed and only upon using the classifier. I would vote for fit personally.

@mfeurer can you have a read of this on Monday and add any comments, I've forgotten the discussion we had about this.

Best,
Eddie

eddiebergman · 2022-05-10T14:20:00Z

Hey @aseemk98,

If you're still up for it, we had a discussion and we think doing so in __init__ makes the most sense, what do you think? If you're busy though it's okay, just documenting this here for future purposes :)

Best,
Eddie

aseemk98 · 2022-05-10T15:59:48Z

Hi @eddiebergman ,

I understand why the first mentioned method is not feasible but I don't exactly understand the init method that you want me to take.

P.s really sorry, this is my first crack at an open sourced repo

eddiebergman · 2022-05-11T11:03:49Z

Hi @aseemk98,

No problem, we're delighted you would like to contribute :) So that's my bad, I did not give enough context for the __init__ method. We would like the selector to be trained when the user creates the AutoSklearn2Classifier instance. Most of the code for the selector training should be wrapped in a function and then called from inside the __init__ method.

We can also improve upon the current training of the selector which takes ~60s. This is training a selector for 4 different metrics yet once the AutoSklearn2Classifier is created, we can know what the metric is, hence we should only need to train the selector for that.

These two things together mean that in practice, we can improve the situation from ~60s at import to ~15s at __init__. There might be some slight issues that come up but I think they would be best tackled once encountered.

If you're not aware, I would check out our Contribution Guide on how to start. Please feel free to ask any questions when you have them :)

aseemk98 · 2022-05-11T18:01:32Z

Hi @eddiebergman ,

If I understand correctly, the selectors are being trained on the various metric in the following loop
According to what you've described, we intend to do the same when __init__ is called. Please correct me if I'm wrong.

eddiebergman · 2022-05-11T18:24:18Z

You got it :)

aseemk98 · 2022-05-11T18:27:40Z

Hi @eddiebergman ,
I think I understood the issue better after checking out #1362. I encapsulated the selector training code in a function and called it inside __init__ . This dropped the import time from ~30s to 1.05e-05s for me.

eddiebergman · 2022-05-12T15:55:56Z

That seems promising, as soon as you're happy with an initial PR, feel free to make a Pull Request (PR) with your changes to the development branch and we can review it and give feedback + any further direction. Even if it's not finished it's okay, just lets you get direct feedback quicker!

aseemk98 · 2022-05-12T19:44:13Z

You can find the PR here

eddiebergman · 2022-05-13T16:25:56Z

As an fyi, you'll find feedback on your PR and we can continue from there :)

eddiebergman added maintenance Internal maintenance Good first issue labels Jan 10, 2022

eddiebergman mentioned this issue Jan 12, 2022

Slow import of AutoSklearn2Classifier #1362

Closed

eddiebergman linked a pull request May 13, 2022 that will close this issue

Encapsulated the selector training within a function and called it inside _init_ #1473

Merged

eddiebergman closed this as completed Jun 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Export asklearn2 predictions to improve start up time #1364

Export asklearn2 predictions to improve start up time #1364

eddiebergman commented Jan 10, 2022 •

edited

Loading

aseemk98 commented May 6, 2022

eddiebergman commented May 7, 2022

eddiebergman commented May 10, 2022

aseemk98 commented May 10, 2022

eddiebergman commented May 11, 2022

aseemk98 commented May 11, 2022

eddiebergman commented May 11, 2022

aseemk98 commented May 11, 2022 •

edited

Loading

eddiebergman commented May 12, 2022

aseemk98 commented May 12, 2022

eddiebergman commented May 13, 2022

Export asklearn2 predictions to improve start up time #1364

Export asklearn2 predictions to improve start up time #1364

Comments

eddiebergman commented Jan 10, 2022 • edited Loading

aseemk98 commented May 6, 2022

eddiebergman commented May 7, 2022

eddiebergman commented May 10, 2022

aseemk98 commented May 10, 2022

eddiebergman commented May 11, 2022

aseemk98 commented May 11, 2022

eddiebergman commented May 11, 2022

aseemk98 commented May 11, 2022 • edited Loading

eddiebergman commented May 12, 2022

aseemk98 commented May 12, 2022

eddiebergman commented May 13, 2022

eddiebergman commented Jan 10, 2022 •

edited

Loading

aseemk98 commented May 11, 2022 •

edited

Loading