Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Export asklearn2 predictions to improve start up time #1364

Closed
eddiebergman opened this issue Jan 10, 2022 · 11 comments · Fixed by #1473
Closed

Export asklearn2 predictions to improve start up time #1364

eddiebergman opened this issue Jan 10, 2022 · 11 comments · Fixed by #1473
Labels

Comments

@eddiebergman
Copy link
Contributor

eddiebergman commented Jan 10, 2022

We currently fit and predict upon loading autosklearn.experimental.askl2 for the first time. In environments with a non-persistent filesystem (autosklearn is installed into a new filesystem each time), this can add quite a bit of time delay as experienced in #1362

It seems more applicable to export the predictions with the library to save on this time.

@aseemk98
Copy link
Contributor

aseemk98 commented May 6, 2022

Hi @eddiebergman. Would love to take this issue up as a first-timer. Is this still open?

@eddiebergman
Copy link
Contributor Author

Hi @aseemk98,

So we discussed this and we're not sure we have a good solution in mind but if you have ideas, we would be happy to discuss them! Essentially there is some model trained for each metric upon import of autosklearn.experimental.askl2. There's two routes I can see to go from here.

  1. Export the model with the code. This has a few downsides, firstly the size of the model may be large (needs to be tested) and secondly, from a maintenance perspective, this means changing our CI system to make sure these models are built before we push anything to PyPi.
  2. We train them if needed on a call to either fit or __init__. It seems we do this for each of the valid metrics at the moment, regardless if it's used or not. The overhead would still exist with this solution but only the overhead that's actually needed and only upon using the classifier. I would vote for fit personally.

@mfeurer can you have a read of this on Monday and add any comments, I've forgotten the discussion we had about this.

Best,
Eddie

@eddiebergman
Copy link
Contributor Author

Hey @aseemk98,

If you're still up for it, we had a discussion and we think doing so in __init__ makes the most sense, what do you think? If you're busy though it's okay, just documenting this here for future purposes :)

Best,
Eddie

@aseemk98
Copy link
Contributor

Hi @eddiebergman ,

I understand why the first mentioned method is not feasible but I don't exactly understand the init method that you want me to take.

P.s really sorry, this is my first crack at an open sourced repo

@eddiebergman
Copy link
Contributor Author

Hi @aseemk98,

No problem, we're delighted you would like to contribute :) So that's my bad, I did not give enough context for the __init__ method. We would like the selector to be trained when the user creates the AutoSklearn2Classifier instance. Most of the code for the selector training should be wrapped in a function and then called from inside the __init__ method.

We can also improve upon the current training of the selector which takes ~60s. This is training a selector for 4 different metrics yet once the AutoSklearn2Classifier is created, we can know what the metric is, hence we should only need to train the selector for that.

These two things together mean that in practice, we can improve the situation from ~60s at import to ~15s at __init__. There might be some slight issues that come up but I think they would be best tackled once encountered.

If you're not aware, I would check out our Contribution Guide on how to start. Please feel free to ask any questions when you have them :)

@aseemk98
Copy link
Contributor

Hi @eddiebergman ,

If I understand correctly, the selectors are being trained on the various metric in the following loop
According to what you've described, we intend to do the same when __init__ is called. Please correct me if I'm wrong.

@eddiebergman
Copy link
Contributor Author

You got it :)

@aseemk98
Copy link
Contributor

aseemk98 commented May 11, 2022

Hi @eddiebergman ,
I think I understood the issue better after checking out #1362. I encapsulated the selector training code in a function and called it inside __init__ . This dropped the import time from ~30s to 1.05e-05s for me.

@eddiebergman
Copy link
Contributor Author

That seems promising, as soon as you're happy with an initial PR, feel free to make a Pull Request (PR) with your changes to the development branch and we can review it and give feedback + any further direction. Even if it's not finished it's okay, just lets you get direct feedback quicker!

@aseemk98
Copy link
Contributor

You can find the PR here

@eddiebergman
Copy link
Contributor Author

As an fyi, you'll find feedback on your PR and we can continue from there :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
2 participants