-
Notifications
You must be signed in to change notification settings - Fork 384
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sacred Workflows #663
Comments
*edited (see after - ) About OO approachAbout JupyterIs it really necessary in a interactive environment to have a config file ? - @Qwlouse The expected workflow would be to use a Jupyter as an exploratory environment and afterwork transfert in a script environment ? |
For me, it would be important that everything is usable without sacred for various reasons (as already mentioned in #610 (comment)). So, captured functions should be callable without the need of a sacred experiment, configuration objects should be resolvable without an experiment or arg parser and an object-oriented experiment should be usable without the sacred environment by just passing the config values as arguments to init. About OO approachI have some questions about how you imagine the object-oriented experiments:
To make it clearer: class MyExperiment(Experiment, Trainer): # Is this possible without strange MRO things happening?
def __init__(self, learn_rate=0.001, ...): # Pass "normal" args so that the impact of sacred is minimal
super().__init__() # Where does the experiment get its name from?
self.learn_rate = learn_rate # This is closer to the "sacred-free" use case
...
@main
def main_function(self, some_additional_arg): # This should support additional args
... # do stuff
return final_loss
ex = MyExperiment(**get_commandline_updates()) # This now becomes difficult. Is it even possible?
# What about config updates from outside? This makes the use of `__init__` for config creation even
# more difficult
ex.config.add_config(learn_rate=0.1)
# Wouldn't this usually get the command line updates?
ex.run() And it should be possible to do ex = MyExperiment(learn_rate=1234)
print(ex.main_function(some_additional_arg=2345)) About JupyterAdditionally, it should be no problem to import an experiment (class) or a config object from an existing script (i.e., it should not be required to pass any additional |
@thequilo It is unclear to me what you mean by everything being usable without Sacred (after reading your linked comment). I think we all agree that one should be able to use, e.g., gin-config + sacred for logging all information about the experiment. But it seems you mean something more(?) Do you specifically mean without defining a Sacred Experiment (while still using Sacred otherwise, e.g. just for configuration resolution) instead of without sacred? |
This is now not a so big problem anymore with the config and experiment decoupled from each other. It is actually quite difficult to describe and depends on what exactly you do. Over the past I multiple times ran into the problem that I had to construct a dummy experiment to be able to use some part of the experiment in the notebook or in tests. For captured function, I mean it should be possible to call them without constructing an Experiment. This simplifies testing and "quick tests" in a notebook a lot (this is currently possible if no captured functions are used from within captured functions). Same for config object, should be possible to construct without an experiment and command-line interface and also based on other frameworks (gin-config). This is currently not possible but will be possible after the rework. For the experiment class it should be possible to instantiate it without constructing a config (like: pass some arguments to its init, and this does not evaluate config scopes because they are decoupled) and without command-line. If it is still too confusing what I mean, we can just ignore it. I feel like the rework discussion is going in the right direction with this and I'll comment if something is going to become "too involving" or unusable "without sacred". |
Superexperiments and StagesSince @Qwlouse said that this could be the place for discussions about Superexperiments and Stages, here a small proposal of what could be possible with the reworked config. This does not require anything new (in addition to the things that already emerged from the discussion), but it would be convenient to have a way of registering a run/experiment as a sub-run (or sub-experiment) or another one. This could even be nested. So, as a very rough idea (ignore any naming): A "Stage" is an experiment that is used by a "Superexperiment". A run object can be registered to another run object to be a sub-run of that run, this works nested as well. data_stage = Experiment('data_preparation')
train_stage = Experiment('train')
eval_stage = Experiment('eval')
# Now define your experiments with captured functions, config, possibly in different files...
@data_stage.main
def data_main():
if check_if_already_completed():
# Not sure how to do this yet, maybe check for specific files or the state of some
# experiment ID in a db
return
else:
# Run data preparation
@eval_stage.main
def eval_main():
if check_if_already_completed():
# Could return the result of the completed run to make restarting of failed or
# incomplete parameter sweeps possible
return load_result_of_completed_eval_run()
else:
# Run evaluation
return result
ex = Experiment('superexperiment')
@ex.config
def config(cfg):
stft_size = 512 # An example config value that gets shared among all stages
# Add the config of the stages like for ingredients
# This makes sense in this case, but it some cases we might want to share the whole
# config among stages. See below
cfg.data = data_stage.config(stft_size=stft_size)
cfg.train = train_stage.config(stft_size=stft_size)
cfg.eval = eval_stage.config(stft_size=stft_size)
# To share the whole config among all stages. Then, it does not make sense to add the config
# as for ingredients as above
data_stage.config.add_config(ex.config)
train_stage.config.add_config(ex.config)
eval_stage.config.add_config(ex.config)
@ex.automain
def main(_run, _config):
# We could add a "super_run" arg to the run method so that the resulting run object
# (if we keep something like that) is registered as a sub-run or sub-experiment
data.run(super_run=_run)
# In some cases, it might make sense to allow passing the config and an additional
# ID to run (e.g., paramter sweeps)
results = []
for learning_rate in (0.1, 0.01, 0.001):
train_run = train.create_run(
super_run=_run,
config_updates=dict(learning_rate=learning_rate),
stage_id=f'run_learning_rate={learning_rate}')
train_run.run()
# Pass the constructed config of the train run to the eval run so that eval knows where
# the model files are stored
eval_run = eval.create_run(
super_run=_run,
config_updates=train_run.config,
stage_id=f'eval_learning_rate={learning_rate}')
result = eval_run.run()
results.append((result, eval_run))
# And now we can find the optimal configuration
best = min(results, key=lambda x: x[0])
return best What do you think? Does this make any sense? |
Let me clarify here that the main reason that Sacred Experiments are not compatible with ray.tune's class based API (for example) is not exactly the lack of a OO API, but a bit deeper. Currently Sacred can be used with the functional API of ray.tune as follows (a bit simpler than @rueberger's #193 (comment)): def train_example(config, reporter):
from my_script import ex
ex.observers.append(MongoObserver.create(...)
result = ex.run(config_updates=config)
reporter(...) The (class-based) Trainable API of ray.tune is more powerful because it is designed with the understanding that experiment runs can be structured like calling a function I haven't tested this yet (Update: tested, works), but I realized that actually it may actually be possible to hack a Trainable class using Sacred: import ray
from ray import tune
from ray.tune import Trainable
class Example(Trainable):
def _setup(self, config):
from my_script import ex
self.ex = ex
self.current_config = config
def _train(self):
run = self.ex.run('step', config_updates=self.current_config)
return dict(value=run.result)
def _save(self, tmp_checkpoint_dir):
config_updates = {'tmp_checkpoint_dir': tmp_checkpoint_dir}
run = self.ex.run('save', config_updates=config_updates)
return run.result
def _restore(self, checkpoint_path):
config_updates = {'checkpoint_path': checkpoint_path}
run = self.ex.run('restore', config_updates=config_updates)
def reset_config(self, new_config):
self.current_config = new_config Here In summary, I think an OO design of Experiment would need to:
|
For "reinterpret the concept of calling a command": It would be useful for any kind of parallel experiment, e.g., MPI |
@davebulavalI do not understand what you mean by "force of Sacred" or by the expected level of the end user (what is lambda in this context?). But it is a good point, that we should write a goal / vision statement for sacred. I'll draft something and post it for discussion.
The main function is only the default command, and you are free to define others.
Yes. An important reason to use sacred in an interactive environment would be to draft an experiment that can later be converted into a script. Also I might still want to use observers that log the configuration etc. @thequilo
I understand and completely agree: This is a very important point. Ideally each component of sacred (the configuration, the command/capture functions, observers, the commandline interface, ...) should be usable in isolation. That way you can use and test them individually, nothing gets "locked-in" to an experiment, and it provides a lot of flexibility for customization. All of these points are very important in my opinion, and we could do a much better job of enabling them.
These are very good questions, by which I mean: I do not have good answers :-)
RayTune@flukeskywalker Wouldn't it be enough then to do someting like this?: from my_script import ex, step, save, restore
class Example(Trainable):
def _setup(self, config):
from my_script import ex
self.ex = ex
self.run = ex._create_run(config_updates=config)
def _train(self):
return step()
def _save(self):
save()
def _restore(self):
restore()
def reset_config(self, new_config):
self.run = ex._create_run(config_updates=new_config) This creates a new run upon initialization and everytime |
I think one problem with raytune is that for some distributed configurations it can run one step one one machine, copy the saved state to another machine and resume there for the next step (I hope I understood the docs correctly). Then, your approach above does not work and it would be necessary to implement some sort of resume mechanism for the observers (And this kind of brings back the discussion about the Observers/Retriever #483, because resuming is essentially the same as loading data from a storage for analysis). |
Note: I've corrected my code snippet and tested that it actually works. @Qwlouse Unfortunately that doesn't work. The global imports make ray attempt to serialize those objects, which ends in disaster. This is one of the reasons that it took a while for people to trust that ray & Sacred can work together (CC @engintoklu). Making each ray actor do the imports locally is the only way I've figured out to avoid this issue. In general (and as @thequilo points out above), medium to large experiments have a general requirement of checkpointing/restoring to handle failed experiments on clusters etc. So the 4 requirements above seem unavoidable. I am not sure whether it is just the observers that need to be modified or the Experiment/Run behavior as well. Would something like |
@thequilo and @flukeskywalker: Regarding Stages and RayTune:
I would add to that:
These are important issues, and I emphatically share the wish to support these. However, I am afraid, that tackling them all at once is too large a project. It will require re-engineering several core parts of Sacred apart from the configuration system. I'm happy to discuss them, and gladly welcome any input on these issues. But sadly, I doubt that we have the capacity to tackle them at the moment. |
@Qwlouse Agreed. We should for now focus on the config process. A discussion about data persistence and observers would by itself become as big as the discussion about the config that we already have. But it might be good to have some place where general ideas can be collected (Another issue) for future work. While reworking the config we should make sure that point 4 is fulfilled (full pickleability of the config object(s)), and maybe also point 3 (thread safety) for potential captured functions and the config creation process. |
@Qwlouse Agreed, but I thought this discussion was not about the config only from the subject and the opening comment :) Config is certainly the priority right now -- while keeping these issues in mind so config doesn't need to be reworked yet again. @thequilo I think Klaus meant full pickle-ability of the Experiment itself, not just the config, which seems complicated. Nevertheless, I do think it would be nice to take some steps these directions to support powerful workflows. For example, if observer states can be saved/restored, and asked not to assume a new run for every command, Sacred users could already benefit from the rapidly maturing Ray Tune. So it is good to know that you are open to these changes! |
@flukeskywalker I know, but it's even harder to pickle an experiment if its config is not even pickleable. |
@flukeskywalker Sorry, it wasn't my intention to shut this discussion down. Especially since I was the one who invited the discussion. I take my concern back. Let's keep this discussion alive and work on a plan to properly integrate stages and superexperiments. Execution of the plan might have to wait, but that shouldn't keep us from thinking about it, and keeping it in mind while reworking the config. |
Ok, then I'll drop some thoughts about the points mentioned above: 2. There is no concept of workflow lineage or dependency in sacred.There are many possibilities to introduce dependencies in an experiment/between experiments, and there are different types of dependencies:
Then, there is the question where to define those dependencies. This could be done on an experiment level ( 3. Thread safety to allow parallel execution of multiple runs from the same programThis one is difficult with the captured functions. Currently, the config for the captured functions is stored in the function object itself (globally), so it is shared between all threads. But is multithreading really what we want? For many use-cases, we probably want multiprocessing. 4. Full pickleability to support serialization and distributing across machinesWhat needs to be pickled? Or, at what stage of an experiment will the experiment get pickled?
I guess all of them should be supported except for 1(?). 5. An overhaul of the observer data format to account for stages, resuming, and changing parameters during a run.One idea is to make it even more general, or introduce the concept of a class Hook:
def queued(run, time...):
def pre_config(ex): # Called before the config is constructed (no idea for what this could be useful, but there certainly is something)
def post_config(ex, config): # After constructing the config, prior to constructing the run. Could modify the config like current config hooks
def pre_run(run): # Called before the run starts. Can be used to log started event
def post_run(run, time...): # In here, run has a status and this can be written to a db (like the observers competed, failed, maybe suspended, ....)
def restored(run): # Could be a way to handle restoring
def heartbeat(run):
def add_artifact(run, artifact):
def config_change(run, config):
def log_metrics(run, metrics): This could also be a good starting point for third party libraries to extend the functionality of sacred. An class Observer(Hook):
@classmethod
def load_from_id(id):
def get_metrics():
def get_artifacts():
def get_... # And so on. This would allow to use the Observer as a Retriever and for restoring What is definitely required is some sort of identifier for an experiment or an experiment record for those observers that allow restoring data. For the |
Apologies for my absence, all. I don't have time right now to read and respond in earnest to all of the discussion that has taken place so far, but I would just like to add that I did very briefly take a serious attempt at rearchitecting the sacred backend towards an end goal of supporting population based training (PBT), although I didn't get much farther than outlining the architecture. It's clear that some really major changes will be necessary to the sacred backend to support PBT. For instance, there can no longer be a well-defined notion of experiment ids due to branching experiment lineages. Rather, we must be adopt It also requires that observers and retrievers become two sides of the same coins, PBT must be able to load and mutate past experiment steps. My plan for But of course, my life caught up with me shortly afterwards... Have we considered seeking funding? Seems like most of our problems just come down to no one having time to maintain Surely there's got to be some money out there for developing this kind of ML tooling. |
@rueberger why do you think these changes are necessary to support PBT style config mutations? Wouldn't it be fine if Sacred understands that an experiment can consist of running a function multiple times (or multiple functions in series), or continuing a checkpointed experiment, and that the config can potentially change across these runs. We should consider these big changes, but my question is why are they absolutely necessary? Technically, PBT (and perhaps any hyperparameter optimization procedure) should be viewed and recorded as "1 PBT experiment" (just like an experiment with any other Genetic algorithm) instead of several experiments interacting with each other. The internal hyperparameters that PBT optimizes would then simply be metrics. May be this is a way to better resolve the conflict. |
Perhaps not necessary, but ideal. That is the proper abstraction, PBT is branching. And although we could surely come up with many ways to hack it, it would go a long ways towards maintainability and overall usability for sacred's backend to fully support the abstraction. For instance, think of external tools like omniboard. Omniboard and its predecessor let you look at a single model 'lineage' at a time. If you confuse the notion of an experiment by introducing the concept of some hyperopt meta-experiment, omniboard is badly broken. In my own hyperopt tooling, I configure things so that each new hyperopt trial is its own experiment. I wanted to be able to do the same thing with PBT, and that drove me to this design. In the merkle-tree experiment abstraction, experiment 'lineages' are identified by their endpoints. In this way, it would be easy to provide the existing interface and continue to use omniboard and the like. |
We may have to agree to disagree that this is the proper abstraction :) though it may certainly have its merits! As hyperparameter optimization gets more sophisticated, it seems more reasonable to me to consider each hyperparameter optimization run an experiment itself that should be fully reproducible. From that perspective, each trial need not be a separate experiment with its own independent observers. Instead, observations should ideally be made at the level of the hyperparameter optimizer. The current ray tune design may be a bad fit for this though -- I'm not sure. |
I would be interest to see some code of that as an example. I don't know about this much. |
Meaning we should force to use a observer in any experiment ? I can never be fully statisfy of my own discussion if the config/hyperparam are the property of an experiment or and experiment is the property of the config/hyperparam. From a point of view of reproductibility an experiment should have the property of his config not the other. But in the mean time, a set of config (and maybe also a code snapshot) define the result of an experiment. Also as an id, an hashable name can be interesting because is reversable, but two similar run will overtake one other (without a time stamp). For the past week, i've thinked about using a delta approach in my experiment. Meaning I have an initial delta (kind of my baseline) and I create delta modification to it in an attempt to improve it and those delta are part of my experiment. I don't know if this would be a good approach wet but maybe we could discuss if this is a interesting approach. |
I think (and this can be totally wrong) this should be some kind of mixed nested experiment. In my opinion, the outer hyperparameter optimization thing should be an experiment that runs smaller trial experiments. And there are things that should be observed locally (e.g., the config or loss curve of a specific trial) and things that should be observed globally on the level of the hyperparameter optimization (the current state, config of the best trial, ...). Seeing it this way requires to support nested experiments. But as I said, this point of view can be totally wrong because I never used a hyperparameter optimizer before.
I think this shouldn't be forced but should be made possible so that the users themselves can decide what exactly they want to observe.
What's the benefit over using a simple experiment name?
I like the idea of the delta approach (storing a reference to the previous experiment and the delta information, either on a run or an experiment level) for experiments that depend on each other. This could mean a restarted run with changed config (e.g., larger number of epochs) or multiple trials of a hyperparameter optimizer with changed configuration or even manual tuning. There was some issue or some suggestions about this before but I don't remember where to find it. |
On the topic of "meta experiments", it would be useful to have a utility function that can convert sacred config options into bash arguments. I've cobbled something similar for doing simple hyperparameter search on HPC where I use a script that will generate a bunch of job scripts. It would be cool if sacred v2 had something like: cmd = sacred.to_bash_command(
'my/train.py', 'a_command',
config={run_id: ids[i], 'lr': lr[i]},
named_config={'blah', 'blorp'})
# then use cmd to write "run_{ids[i]}.sbatch", for example Which simplifies the logic in each script because they only need to worry about a single run, and allows each run to be run in parallel without any extra fuss. |
@davebulaval example for you.
I don't know what you mean by reverseable, as hashes are typically not so reverseable. As far as hash collisions, it's certainly possible but is not a practical concern. With sha256 and a modest address space the odds of collision are astronomically small. @flukeskywalker I think it's important to carefully delineate design considerations from identifying the proper abstractions. I empathize with your desires, it also seems reasonable to me for a hyperparam run to itself be a reproducible experiment. But I see this as a much larger design question; should or should not sacred try to become more of a blackbox, or perhaps have some sort of hierarchical experiment structure. All great questions we should continue discussing. I wasn't really trying to address any of these larger design considerations in my attempt to renovate the backend, simply to resolve the problem I have that sacred's existing backend is incapable of supporting PBT. No big redesigns, just a pragmatic process of 'OK, how we might conceivably support the branching lineage needed for PBT in sacred?'. I would be super hesitant to get into a big redesign, purely out of the concern that we'd be biting off more than we can chew. There's just not that much executive capacity in an unfunded open source organization.. and the kind of redesigns we're talking about would be an enormous undertaking. @thequilo regarding pickling, you may want to take a look at the example I link above, which is essentially just a hack to avoid passing around unserializable bits. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
So maybe I missed something, but what about the sketched OOP workflow in the original post doesn't work? |
In an attempt to structure our discussion I suggest to use this issue to collect a wishlist of how we would like to use Sacred from a birds-eye perspective.
I suggest that we edit this issue to reflect the evolving consensus that (hopefully) emerges from the discussion below.
To get things started I can think of 3 basic workflows, that I would love for sacred to support.
Maybe this is also a good place to think about how to integrate stages and superexperiments.
Interactive (Jupyter Notebook)
Manually control the stages of the experiment / run in an interactive environment. Most suitable for exploration and low complexity experiments. Something like:
Scripting
Using a main script that contains most of the experiment and is run from the commandline.
This is the current main workflow, most suitable for low to medium complexity experiments.
Object Oriented
This is a long-standing feature request #193. Define an experiment as a class to improve modularity (and support frameworks like ray.tune). Should cater to medium to high complexity experiments.
Very tentative API sketch:
The text was updated successfully, but these errors were encountered: