-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow for freeze-thaw of configurations where possible #143
Comments
Yes, this would be a useful addition and in principle, this could even work for all benchmarks including the tabular ones. Here are a few things we should keep in mind when designing the API: This will make the benchmarks stateful, more complex, and requires additional memory since the benchmarks need to store which configs/seeds have been trained on which budget. There are several options for this, e.g. via a file, database, or in memory, each with pros and cons. Additionally, this will interfere with optimizers evaluating configurations in parallel and raises further questions, e.g. whether and how to share the state among evaluations. |
Hey guy, This sounds like a very useful addition. I know this might be a special scenario of freeze-thaw, but there are some scenarios in which "continue training" could simply mean loading some weights. E.g. the weights of a neural network, etc. This particular use-case/scenario is potentially easy to tackle. And might be a good starting point. We could solve this case by implementing some hooks that are called before or after the These functions could be used to load or save the model weights to a directory defined in the We could also create a second "baseclass" that makes clear that the benchmark does not support freeze-thaw. However, @KEggensperger and I have already discussed that there are some cases for which that easy solution does not work. What do you think? |
Thank you for your valuable input and for raising important points. Given you both have thought about this much longer, I would definitely like to have a more detailed conversation once I have a basic design or a prototype ready.
For the first iteration, I was thinking of going with some in-memory data structure of unique configurations evaluated, and their corresponding latest fidelity evaluated. One immediate issue would be how to handle duplicate queries. Also, we could include a
But we won't need this feature for tabular benchmarks, right?
Definitely a concern. However, just like we say that we don't support freeze-thaw now, can we later claim that we support freeze-thaw only for the 1-worker setup?
Indeed, and I agree. For the first iteration, I want to take the
Not too familiar with its implementation, but having used them, sounds apt.
Again, sounds perfect as a starting point. Just like the directory for the data for the tabular benchmarks are managed. In the long run, I wonder if it might be useful to allow a more flexible option for the user to edit the directory. Since I am not sure how the memory starts bloating with long runs of an instantiation of HPOBench. Secondly, given we don't allow checkpointing and resuming of HPOBench runs, we might want to define some operations in
This is something I have heard a few times now and I need to probably work with containers more to fully understand why. Can a process that runs on a container, not access files saved locally outside the container? Isn't that what the TabularBenchmark kind of does?
I totally agree. All the above points I mentioned are w.r.t. the first iteration of Freeze-Thaw for the MLP space, that we could try first. Would be great to hear your thoughts on the same! |
To allow for a wider multi-fidelity scope, it would be nice to allow configurations to be optionally restarted from a model checkpoint. This would be applicable for the tree-based search spaces (RandomForest, XGB) and the neural network spaces. That is if a call to the
objective_function
is made for a configuration that has already been evaluated on a particular fidelity, but on a higher unseen fidelity, the function should load a model for the configuration at the lower fidelity and continue training till the higher fidelity (add more trees or training more epochs). In such a case, the costs returned would indicate the cost involved in continuing the training.Implementation of this into HPOBench would require considerations around the best API design that doesn't break existing APIs. While also checking how model loading and saving can be managed with the Docker interface. Also, it needs to be decided if the function evaluation costs should account for the model I/O to disk. Since ignoring them might affect the true cost of benchmark querying.
The text was updated successfully, but these errors were encountered: