Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sacred Workflows #663

Closed
Qwlouse opened this issue Sep 20, 2019 · 27 comments
Closed

Sacred Workflows #663

Qwlouse opened this issue Sep 20, 2019 · 27 comments

Comments

@Qwlouse
Copy link
Collaborator

Qwlouse commented Sep 20, 2019

In an attempt to structure our discussion I suggest to use this issue to collect a wishlist of how we would like to use Sacred from a birds-eye perspective.
I suggest that we edit this issue to reflect the evolving consensus that (hopefully) emerges from the discussion below.
To get things started I can think of 3 basic workflows, that I would love for sacred to support.
Maybe this is also a good place to think about how to integrate stages and superexperiments.

Interactive (Jupyter Notebook)

Manually control the stages of the experiment / run in an interactive environment. Most suitable for exploration and low complexity experiments. Something like:

# -----------------------------------------------------------
# initialization
ex = Experiment('my_jupyter_experiment')
ex.add_observer(FilestorageObserver('tmp'))
# -----------------------------------------------------------
# Config and Functions
cfg = Configuration()
cfg.learn_rate = 0.01
cfg.hidden_sizes = [100, 100]
cfg.batch_size = 32

@ex.capture
def get_dataset(batch_size):
    ....
# -----------------------------------------------------------
# run experiment
ex.start()   # finalizes config, starts observers
data = get_dataset()  # call functions 
for i in range(1000):
    # do something 
    ex.log_metric('loss', loss)  # log metrics, artifacts, etc.

ex.stop(result=final_loss)
# -----------------------------------------------------------

Scripting

Using a main script that contains most of the experiment and is run from the commandline.
This is the current main workflow, most suitable for low to medium complexity experiments.

ex = Experiment('my_experiment_script')

@ex.config
def config(cfg):
    cfg.learn_rate = 0.01
    ...

@ex.capture
def get_dataset(batch_size):
    ....

@ex.automain  # define a main function which automatically starts and stops the experiment
def main():
    ....   # do stuff, log metrics, etc.
    return final_loss

Object Oriented

This is a long-standing feature request #193. Define an experiment as a class to improve modularity (and support frameworks like ray.tune). Should cater to medium to high complexity experiments.
Very tentative API sketch:

class MyExperiment(Experiment):
    def __init__(self, config=None):   # context-config to deal with updates and nesting
         super().__init__(config)
         self.learn_rate = 0.001   # using self to store config improves IDE support
         ...

    def get_dataset(self):  # no capturing because self gives access to config anyways
        return ...

    @main   # mark main function / commands 
    def main_function(self):
         ...   # do stuff
         return final_loss

ex = MyExperiment(config=get_commandline_updates())
ex.run()
@davebulaval
Copy link
Contributor

davebulaval commented Sep 21, 2019

*edited (see after - )
First, what is the main objectif of Sacred ? (and the one we want to keep) - see issue about that
Second, what are the force of Sacred ? (and the one we want to keep or add ?!?) - Bad traduction -> the strong points, the must use (I think it's the config workflow and the automatic logging, but maybe there others points I miss)
Third, was is the intended level of the end-user (lambda or better) ? - Lambda user been user who don't know well Machine Learning (or else use of sacred) meaning some low level programmer user. And better been more advance programmer user knowing how to open the code and make it work their way. This will be scope the objective and level to reach for the framework (I think)

About OO approach

About Jupyter

Is it really necessary in a interactive environment to have a config file ? - @Qwlouse The expected workflow would be to use a Jupyter as an exploratory environment and afterwork transfert in a script environment ?

@thequilo
Copy link
Collaborator

thequilo commented Sep 21, 2019

For me, it would be important that everything is usable without sacred for various reasons (as already mentioned in #610 (comment)). So, captured functions should be callable without the need of a sacred experiment, configuration objects should be resolvable without an experiment or arg parser and an object-oriented experiment should be usable without the sacred environment by just passing the config values as arguments to init.

About OO approach

I have some questions about how you imagine the object-oriented experiments:

  • Is the experiment class that is subclassed here the same as the sacred experiment that takes a name (and currently ingredients and others) as parameters for __init__?
  • Is the __init__ supposed to work in the same way as a config scope? If so, this requires some black magic. To make it intuitive, it could be better to let __init__ have "normal" arguments.
  • Does this support multiple inheritance? E.g., inherit from sacred.Experiment and some custom, say,Trainer?
  • In my opinion, the overhead/impact of adding sacred to an already existing script should be as minimal as possible

To make it clearer:

class MyExperiment(Experiment, Trainer):   # Is this possible without strange MRO things happening?
    def __init__(self, learn_rate=0.001, ...):   # Pass "normal" args so that the impact of sacred is minimal
         super().__init__()    # Where does the experiment get its name from?
         self.learn_rate = learn_rate   # This is closer to the "sacred-free" use case
         ...

    @main
    def main_function(self, some_additional_arg):  # This should support additional args
         ...   # do stuff
         return final_loss

ex = MyExperiment(**get_commandline_updates())  # This now becomes difficult. Is it even possible?

# What about config updates from outside? This makes the use of `__init__` for config creation even
# more difficult
ex.config.add_config(learn_rate=0.1)

# Wouldn't this usually get the command line updates?
ex.run()

And it should be possible to do

ex = MyExperiment(learn_rate=1234)
print(ex.main_function(some_additional_arg=2345))

About Jupyter

Additionally, it should be no problem to import an experiment (class) or a config object from an existing script (i.e., it should not be required to pass any additional interactive=True flag to anything)

@flukeskywalker
Copy link

@thequilo It is unclear to me what you mean by everything being usable without Sacred (after reading your linked comment). I think we all agree that one should be able to use, e.g., gin-config + sacred for logging all information about the experiment. But it seems you mean something more(?)

Do you specifically mean without defining a Sacred Experiment (while still using Sacred otherwise, e.g. just for configuration resolution) instead of without sacred?

@thequilo
Copy link
Collaborator

This is now not a so big problem anymore with the config and experiment decoupled from each other. It is actually quite difficult to describe and depends on what exactly you do. Over the past I multiple times ran into the problem that I had to construct a dummy experiment to be able to use some part of the experiment in the notebook or in tests.

For captured function, I mean it should be possible to call them without constructing an Experiment. This simplifies testing and "quick tests" in a notebook a lot (this is currently possible if no captured functions are used from within captured functions). Same for config object, should be possible to construct without an experiment and command-line interface and also based on other frameworks (gin-config). This is currently not possible but will be possible after the rework. For the experiment class it should be possible to instantiate it without constructing a config (like: pass some arguments to its init, and this does not evaluate config scopes because they are decoupled) and without command-line.

If it is still too confusing what I mean, we can just ignore it. I feel like the rework discussion is going in the right direction with this and I'll comment if something is going to become "too involving" or unusable "without sacred".

@thequilo
Copy link
Collaborator

Superexperiments and Stages

Since @Qwlouse said that this could be the place for discussions about Superexperiments and Stages, here a small proposal of what could be possible with the reworked config. This does not require anything new (in addition to the things that already emerged from the discussion), but it would be convenient to have a way of registering a run/experiment as a sub-run (or sub-experiment) or another one. This could even be nested.

So, as a very rough idea (ignore any naming): A "Stage" is an experiment that is used by a "Superexperiment". A run object can be registered to another run object to be a sub-run of that run, this works nested as well.

data_stage = Experiment('data_preparation')
train_stage = Experiment('train')
eval_stage = Experiment('eval')

# Now define your experiments with captured functions, config, possibly in different files...
@data_stage.main
def data_main():
    if check_if_already_completed():
        # Not sure how to do this yet, maybe check for specific files or the state of some 
        # experiment ID in a db
        return
    else:
        # Run data preparation

@eval_stage.main
def eval_main():
    if check_if_already_completed():
        # Could return the result of the completed run to make restarting of failed or 
        # incomplete parameter sweeps possible
        return load_result_of_completed_eval_run()
    else:
        # Run evaluation
        return result

ex = Experiment('superexperiment')

@ex.config
def config(cfg):
    stft_size = 512   # An example config value that gets shared among all stages

    # Add the config of the stages like for ingredients
    # This makes sense in this case, but it some cases we might want to share the whole
    # config among stages. See below
    cfg.data = data_stage.config(stft_size=stft_size)
    cfg.train = train_stage.config(stft_size=stft_size)
    cfg.eval = eval_stage.config(stft_size=stft_size)

# To share the whole config among all stages. Then, it does not make sense to add the config
# as for ingredients as above
data_stage.config.add_config(ex.config)
train_stage.config.add_config(ex.config)
eval_stage.config.add_config(ex.config)

@ex.automain
def main(_run, _config):
    # We could add a "super_run" arg to the run method so that the resulting run object 
    # (if we keep something like that) is registered as a sub-run or sub-experiment
    data.run(super_run=_run)

    # In some cases, it might make sense to allow passing the config and an additional 
    # ID to run (e.g., paramter sweeps)
    results = []
    for learning_rate in (0.1, 0.01, 0.001):
        train_run = train.create_run(
                                super_run=_run, 
                                config_updates=dict(learning_rate=learning_rate),
                                stage_id=f'run_learning_rate={learning_rate}')
        train_run.run()
        
        # Pass the constructed config of the train run to the eval run so that eval knows where
        # the model files are stored
        eval_run = eval.create_run(
                                super_run=_run, 
                                config_updates=train_run.config, 
                                stage_id=f'eval_learning_rate={learning_rate}')
        result = eval_run.run()
        results.append((result, eval_run))

    # And now we can find the optimal configuration
    best = min(results, key=lambda x: x[0])
    return best

What do you think? Does this make any sense?

@flukeskywalker
Copy link

flukeskywalker commented Sep 22, 2019

Let me clarify here that the main reason that Sacred Experiments are not compatible with ray.tune's class based API (for example) is not exactly the lack of a OO API, but a bit deeper.

Currently Sacred can be used with the functional API of ray.tune as follows (a bit simpler than @rueberger's #193 (comment)):

def train_example(config, reporter):
    from my_script import ex
    ex.observers.append(MongoObserver.create(...)
    result = ex.run(config_updates=config)
    reporter(...)

The (class-based) Trainable API of ray.tune is more powerful because it is designed with the understanding that experiment runs can be structured like calling a function step() repeatedly (potentially with different config updates), which provides a good point to save or restore an experiment if needed, and return control to the calling scope.

I haven't tested this yet (Update: tested, works), but I realized that actually it may actually be possible to hack a Trainable class using Sacred:

import ray
from ray import tune
from ray.tune import Trainable

class Example(Trainable):
    def _setup(self, config):
        from my_script import ex
        self.ex = ex
        self.current_config = config

    def _train(self):
        run = self.ex.run('step', config_updates=self.current_config)
        return dict(value=run.result)

    def _save(self, tmp_checkpoint_dir):
        config_updates = {'tmp_checkpoint_dir': tmp_checkpoint_dir}
        run = self.ex.run('save', config_updates=config_updates)
        return run.result

    def _restore(self, checkpoint_path):
        config_updates = {'checkpoint_path': checkpoint_path}
        run = self.ex.run('restore', config_updates=config_updates)

    def reset_config(self, new_config):
        self.current_config = new_config

Here step(), save(), restore() are the Commands defined for the Experiment. The main "hack" here is hidden but in my_script, the user would need to attach any persistent objects (such as Network and Optimizer objects) to the Experiment object itself. The big loss is that it can't solve the problem of observers; they will create a new run for each call of each command :(
One could use tune's own system for tracking of metrics as a stop-gap.

In summary, I think an OO design of Experiment would need to:

  1. formalize the hack above (some objects will need to be persistent across commands)
  2. reinterpret the concept of calling a command (not always a new run)
  3. add support for saving/restoring Experiments (including observers)
  4. record events such as changes in config, changes in the host & environment etc.

@thequilo
Copy link
Collaborator

For "reinterpret the concept of calling a command": It would be useful for any kind of parallel experiment, e.g., MPI

@Qwlouse
Copy link
Collaborator Author

Qwlouse commented Sep 22, 2019

@davebulaval

I do not understand what you mean by "force of Sacred" or by the expected level of the end user (what is lambda in this context?). But it is a good point, that we should write a goal / vision statement for sacred. I'll draft something and post it for discussion.

One drawback I see with the actual propose API class is that code won't seem to be reusable that much with the main_function method.

The main function is only the default command, and you are free to define others.

Is it really necessary in a interactive environment to have a config file ?

Yes. An important reason to use sacred in an interactive environment would be to draft an experiment that can later be converted into a script. Also I might still want to use observers that log the configuration etc.

@thequilo

For me, it would be important that everything is usable without sacred

I understand and completely agree: This is a very important point. Ideally each component of sacred (the configuration, the command/capture functions, observers, the commandline interface, ...) should be usable in isolation. That way you can use and test them individually, nothing gets "locked-in" to an experiment, and it provides a lot of flexibility for customization. All of these points are very important in my opinion, and we could do a much better job of enabling them.

I have some questions about how you imagine the object-oriented experiments [...]

These are very good questions, by which I mean: I do not have good answers :-)

  • If possible I would like to keep the three usecases (interactive, script, OO) as similar and interchangeable as possible. But I am not sure how feasible that will be.
  • There should be some way of getting the same behavior as @ex.config in the OO case, but it doesn't have to be the __init__ method. Reasons for using the __init__ include: Better IDE support, and the intuitive simplicity of just defining the configuration as member fields of the experiment.
  • Multiple-Inheritance: That is a good point we should keep in mind. No idea yet as to the consequences.

RayTune

@flukeskywalker Wouldn't it be enough then to do someting like this?:

from my_script import ex, step, save, restore

class Example(Trainable):
    def _setup(self, config):
        from my_script import ex
        self.ex = ex
        self.run = ex._create_run(config_updates=config)

    def _train(self):
        return step()

    def _save(self):
        save()

    def _restore(self):
        restore()

    def reset_config(self, new_config):
        self.run = ex._create_run(config_updates=new_config)

This creates a new run upon initialization and everytime reset_config is called. The captured functions step, save, and restore use the configuration from the current run. This is obviously not thread-safe (this is a general problem with sacred), but it would keep the observers alive.

@thequilo
Copy link
Collaborator

I think one problem with raytune is that for some distributed configurations it can run one step one one machine, copy the saved state to another machine and resume there for the next step (I hope I understood the docs correctly). Then, your approach above does not work and it would be necessary to implement some sort of resume mechanism for the observers (And this kind of brings back the discussion about the Observers/Retriever #483, because resuming is essentially the same as loading data from a storage for analysis).

@flukeskywalker
Copy link

Note: I've corrected my code snippet and tested that it actually works.

@Qwlouse Unfortunately that doesn't work. The global imports make ray attempt to serialize those objects, which ends in disaster. This is one of the reasons that it took a while for people to trust that ray & Sacred can work together (CC @engintoklu). Making each ray actor do the imports locally is the only way I've figured out to avoid this issue.
Even if it worked, it is debatable that a change to config should reset observers instead of just continuing.

In general (and as @thequilo points out above), medium to large experiments have a general requirement of checkpointing/restoring to handle failed experiments on clusters etc. So the 4 requirements above seem unavoidable.

I am not sure whether it is just the observers that need to be modified or the Experiment/Run behavior as well. Would something like ex.run(..., reset_observers=False) together with ex.save(...) and ex.restore(...) be sufficient?

@Qwlouse
Copy link
Collaborator Author

Qwlouse commented Sep 23, 2019

@thequilo and @flukeskywalker: Regarding Stages and RayTune:
This problem is ties into many other things that should be improved about Sacred. It ties in with an earlier discussion about continuing an experiment #411 and about supporting population based training (see eg this comment by @rueberger )
I think @csadorf summarized it nicely by pointing out that the main problems are:

  1. There is no concept of data persistence in sacred (see also retrievers Observers counterpart (retrievers?) #483)
  2. There is no concept of workflow lineage or dependency in sacred.

I would add to that:

  1. Thread safety to allow parallel execution of multiple runs from the same program
  2. Full pickleability to support serialization and distributing across machines
  3. An overhaul of the observer data format to account for stages, resuming, and changing parameters during a run.

These are important issues, and I emphatically share the wish to support these. However, I am afraid, that tackling them all at once is too large a project. It will require re-engineering several core parts of Sacred apart from the configuration system. I'm happy to discuss them, and gladly welcome any input on these issues. But sadly, I doubt that we have the capacity to tackle them at the moment.

@thequilo
Copy link
Collaborator

@Qwlouse Agreed. We should for now focus on the config process. A discussion about data persistence and observers would by itself become as big as the discussion about the config that we already have. But it might be good to have some place where general ideas can be collected (Another issue) for future work.

While reworking the config we should make sure that point 4 is fulfilled (full pickleability of the config object(s)), and maybe also point 3 (thread safety) for potential captured functions and the config creation process.

@flukeskywalker
Copy link

@Qwlouse Agreed, but I thought this discussion was not about the config only from the subject and the opening comment :) Config is certainly the priority right now -- while keeping these issues in mind so config doesn't need to be reworked yet again.

@thequilo I think Klaus meant full pickle-ability of the Experiment itself, not just the config, which seems complicated.

Nevertheless, I do think it would be nice to take some steps these directions to support powerful workflows. For example, if observer states can be saved/restored, and asked not to assume a new run for every command, Sacred users could already benefit from the rapidly maturing Ray Tune. So it is good to know that you are open to these changes!

@thequilo
Copy link
Collaborator

@flukeskywalker I know, but it's even harder to pickle an experiment if its config is not even pickleable.

@Qwlouse
Copy link
Collaborator Author

Qwlouse commented Sep 25, 2019

@flukeskywalker Sorry, it wasn't my intention to shut this discussion down. Especially since I was the one who invited the discussion. I take my concern back. Let's keep this discussion alive and work on a plan to properly integrate stages and superexperiments. Execution of the plan might have to wait, but that shouldn't keep us from thinking about it, and keeping it in mind while reworking the config.

@thequilo
Copy link
Collaborator

Ok, then I'll drop some thoughts about the points mentioned above:

2. There is no concept of workflow lineage or dependency in sacred.

There are many possibilities to introduce dependencies in an experiment/between experiments, and there are different types of dependencies:

  1. Hierarchical dependencies: There is a super-experiment and some sub-expeirments, e.g., a parameter sweep. In this case, the sub-experiments do not depend on each other and can be executed in parallel or in any order
  2. "Order" dependencies (don't have a better name for it): One experiment depends on the result of another experiment. These could be nested within a super-experiment, bot don't have to (really? this could make things a lot more difficult). This should define something like a partially ordered set, since some experiments (or stages) could have the same "parent" but can be executed in parallel...

Then, there is the question where to define those dependencies. This could be done on an experiment level (ex = Experiment(sub_experiments=(sub_ex1, sub_ex2)), sub_ex1.depends_on(sub_ex2)) or on a run level (ex.run(parent=parent_ex)) and there probably are a ton of other possibilities with different advantages and disadvantages. Questions here are: Do we want to have static dependencies that can be resolved before the experiment is run (would simplify logging and determining in which state to resume a partially failed experiment) or dynamic dependencies that are determined while the experiment is run (which is required if sub-experiments depend on configuration of the superexperiment like in a parameter sweep)? How are dependencies represented and logged?

3. Thread safety to allow parallel execution of multiple runs from the same program

This one is difficult with the captured functions. Currently, the config for the captured functions is stored in the function object itself (globally), so it is shared between all threads. But is multithreading really what we want? For many use-cases, we probably want multiprocessing.

4. Full pickleability to support serialization and distributing across machines

What needs to be pickled? Or, at what stage of an experiment will the experiment get pickled?

  1. pickle command line options -> construct experiment from pickled command line options (trivial)
  2. Construct the experiment -> pickle -> unpicke and run
  3. Construct a Run object -> pickle -> run
  4. Construct a Run object -> run partially -> pickle -> unpickle and continue

I guess all of them should be supported except for 1(?).

5. An overhaul of the observer data format to account for stages, resuming, and changing parameters during a run.

One idea is to make it even more general, or introduce the concept of a Hook, that can not only observe but also modify things in different places (e.g., config before a run is constructed, like the current config hooks), with an interface similar to (just some ideas of what it could contain):

class Hook:
    def queued(run, time...):
    def pre_config(ex):    # Called before the config is constructed (no idea for what this could be useful, but there certainly is something)
    def post_config(ex, config):    # After constructing the config, prior to constructing the run. Could modify the config like current config hooks
    def pre_run(run):    # Called before the run starts. Can be used to log started event
    def post_run(run, time...):    # In here, run has a status and this can be written to a db (like the observers competed, failed, maybe suspended, ....)
    def restored(run):    # Could be a way to handle restoring
    def heartbeat(run):
    def add_artifact(run, artifact):
    def config_change(run, config):
    def log_metrics(run, metrics):

This could also be a good starting point for third party libraries to extend the functionality of sacred.

An Observer could inherit from Hook (although it would inherit some unused functions, which is not the nicest thing to do. If this is not wanted, a more complex class hierarchy could be constructed) and additionally define methods to get data from a storage, which would unify the concept of an observer and a retriever into one object. It might then make sense to distinguish between Observers that can restore data and those that can't (like a notificator #667?).

class Observer(Hook):
    @classmethod
    def load_from_id(id):
    def get_metrics():
    def get_artifacts():
    def get_...   # And so on. This would allow to use the Observer as a Retriever and for restoring

What is definitely required is some sort of identifier for an experiment or an experiment record for those observers that allow restoring data. For the FileStorageObserver this would be a path to the directory where the records get written, for the MongoDBObserver this is the database url, name and ID (not sure exactly, I never used MongoDB, but you get the idea). This could then be referred to for a resume/loading and should be abstracted by some sort of ExperimentID class.

@rueberger
Copy link
Contributor

Apologies for my absence, all. I don't have time right now to read and respond in earnest to all of the discussion that has taken place so far, but I would just like to add that I did very briefly take a serious attempt at rearchitecting the sacred backend towards an end goal of supporting population based training (PBT), although I didn't get much farther than outlining the architecture.

It's clear that some really major changes will be necessary to the sacred backend to support PBT.

For instance, there can no longer be a well-defined notion of experiment ids due to branching experiment lineages. Rather, we must be adopt git style merkle-tree ids.

It also requires that observers and retrievers become two sides of the same coins, PBT must be able to load and mutate past experiment steps.

My plan for observers was to slowly roll out support by reimplementing the existing observer machinery with it and eventually phasing out the current backend entirely.

But of course, my life caught up with me shortly afterwards...

Have we considered seeking funding? Seems like most of our problems just come down to no one having time to maintain sacred. I would really hate to see this incredibly useful project languish.

Surely there's got to be some money out there for developing this kind of ML tooling.

@flukeskywalker
Copy link

For instance, there can no longer be a well-defined notion of experiment ids due to branching experiment lineages. Rather, we must be adopt git style merkle-tree ids.

It also requires that observers and retrievers become two sides of the same coins, PBT must be able to load and mutate past experiment steps.

@rueberger why do you think these changes are necessary to support PBT style config mutations? Wouldn't it be fine if Sacred understands that an experiment can consist of running a function multiple times (or multiple functions in series), or continuing a checkpointed experiment, and that the config can potentially change across these runs. We should consider these big changes, but my question is why are they absolutely necessary?

Technically, PBT (and perhaps any hyperparameter optimization procedure) should be viewed and recorded as "1 PBT experiment" (just like an experiment with any other Genetic algorithm) instead of several experiments interacting with each other. The internal hyperparameters that PBT optimizes would then simply be metrics. May be this is a way to better resolve the conflict.

@rueberger
Copy link
Contributor

Perhaps not necessary, but ideal. That is the proper abstraction, PBT is branching. And although we could surely come up with many ways to hack it, it would go a long ways towards maintainability and overall usability for sacred's backend to fully support the abstraction.

For instance, think of external tools like omniboard. Omniboard and its predecessor let you look at a single model 'lineage' at a time. If you confuse the notion of an experiment by introducing the concept of some hyperopt meta-experiment, omniboard is badly broken.

In my own hyperopt tooling, I configure things so that each new hyperopt trial is its own experiment. I wanted to be able to do the same thing with PBT, and that drove me to this design.

In the merkle-tree experiment abstraction, experiment 'lineages' are identified by their endpoints. In this way, it would be easy to provide the existing interface and continue to use omniboard and the like.

@flukeskywalker
Copy link

We may have to agree to disagree that this is the proper abstraction :) though it may certainly have its merits!

As hyperparameter optimization gets more sophisticated, it seems more reasonable to me to consider each hyperparameter optimization run an experiment itself that should be fully reproducible. From that perspective, each trial need not be a separate experiment with its own independent observers. Instead, observations should ideally be made at the level of the hyperparameter optimizer. The current ray tune design may be a bad fit for this though -- I'm not sure.

@davebulaval
Copy link
Contributor

In my own hyperopt tooling, I configure things so that each new hyperopt trial is its own experiment. I wanted to be able to do the same thing with PBT, and that drove me to this design.

I would be interest to see some code of that as an example. I don't know about this much.

@davebulaval
Copy link
Contributor

As hyperparameter optimization gets more sophisticated, it seems more reasonable to me to consider each hyperparameter optimization run an experiment itself that should be fully reproducible. From that perspective, each trial need not be a separate experiment with its own independent observers. Instead, observations should ideally be made at the level of the hyperparameter optimizer. The current ray tune design may be a bad fit for this though -- I'm not sure.

Meaning we should force to use a observer in any experiment ?

I can never be fully statisfy of my own discussion if the config/hyperparam are the property of an experiment or and experiment is the property of the config/hyperparam. From a point of view of reproductibility an experiment should have the property of his config not the other. But in the mean time, a set of config (and maybe also a code snapshot) define the result of an experiment.

Also as an id, an hashable name can be interesting because is reversable, but two similar run will overtake one other (without a time stamp).

For the past week, i've thinked about using a delta approach in my experiment. Meaning I have an initial delta (kind of my baseline) and I create delta modification to it in an attempt to improve it and those delta are part of my experiment. I don't know if this would be a good approach wet but maybe we could discuss if this is a interesting approach.

@thequilo
Copy link
Collaborator

thequilo commented Oct 1, 2019

As hyperparameter optimization gets more sophisticated, it seems more reasonable to me to consider each hyperparameter optimization run an experiment itself that should be fully reproducible. From that perspective, each trial need not be a separate experiment with its own independent observers. Instead, observations should ideally be made at the level of the hyperparameter optimizer. The current ray tune design may be a bad fit for this though -- I'm not sure.

I think (and this can be totally wrong) this should be some kind of mixed nested experiment. In my opinion, the outer hyperparameter optimization thing should be an experiment that runs smaller trial experiments. And there are things that should be observed locally (e.g., the config or loss curve of a specific trial) and things that should be observed globally on the level of the hyperparameter optimization (the current state, config of the best trial, ...). Seeing it this way requires to support nested experiments. But as I said, this point of view can be totally wrong because I never used a hyperparameter optimizer before.

Meaning we should force to use a observer in any experiment ?

I think this shouldn't be forced but should be made possible so that the users themselves can decide what exactly they want to observe.

Also as an id, an hashable name can be interesting because is reversable, but two similar run will overtake one other (without a time stamp).

What's the benefit over using a simple experiment name?

For the past week, i've thinked about using a delta approach in my experiment.

I like the idea of the delta approach (storing a reference to the previous experiment and the delta information, either on a run or an experiment level) for experiments that depend on each other. This could mean a restarted run with changed config (e.g., larger number of epochs) or multiple trials of a hyperparameter optimizer with changed configuration or even manual tuning. There was some issue or some suggestions about this before but I don't remember where to find it.

@beasteers
Copy link

beasteers commented Oct 2, 2019

On the topic of "meta experiments", it would be useful to have a utility function that can convert sacred config options into bash arguments. I've cobbled something similar for doing simple hyperparameter search on HPC where I use a script that will generate a bunch of job scripts.

It would be cool if sacred v2 had something like:

cmd = sacred.to_bash_command(
    'my/train.py', 'a_command', 
    config={run_id: ids[i], 'lr': lr[i]}, 
    named_config={'blah', 'blorp'})
# then use cmd to write "run_{ids[i]}.sbatch", for example

Which simplifies the logic in each script because they only need to worry about a single run, and allows each run to be run in parallel without any extra fuss.

@rueberger
Copy link
Contributor

@davebulaval example for you.

Also as an id, an hashable name can be interesting because is reversable, but two similar run will overtake one other (without a time stamp).

I don't know what you mean by reverseable, as hashes are typically not so reverseable.

As far as hash collisions, it's certainly possible but is not a practical concern. With sha256 and a modest address space the odds of collision are astronomically small.

@flukeskywalker I think it's important to carefully delineate design considerations from identifying the proper abstractions. I empathize with your desires, it also seems reasonable to me for a hyperparam run to itself be a reproducible experiment. But I see this as a much larger design question; should or should not sacred try to become more of a blackbox, or perhaps have some sort of hierarchical experiment structure. All great questions we should continue discussing.

I wasn't really trying to address any of these larger design considerations in my attempt to renovate the backend, simply to resolve the problem I have that sacred's existing backend is incapable of supporting PBT. No big redesigns, just a pragmatic process of 'OK, how we might conceivably support the branching lineage needed for PBT in sacred?'.

I would be super hesitant to get into a big redesign, purely out of the concern that we'd be biting off more than we can chew. There's just not that much executive capacity in an unfunded open source organization.. and the kind of redesigns we're talking about would be an enormous undertaking.

@thequilo regarding pickling, you may want to take a look at the example I link above, which is essentially just a hack to avoid passing around unserializable bits.

@stale
Copy link

stale bot commented Jan 7, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Jan 7, 2020
@stale stale bot closed this as completed Jan 14, 2020
@balloch
Copy link

balloch commented May 30, 2020

So maybe I missed something, but what about the sketched OOP workflow in the original post doesn't work?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants