best practices for structuring nested experiment runs #416

kirk86 · 2019-02-13T14:55:26Z

Hi folks,
I was wondering if there's a way to structure experiments for each each individual choice of dataset and algorithm.

For instance, you could have something in your code base like this:

foreach dataset
    foreach algorithm
          run() and record experiments

Eventually since these runs are based on individual combo of dataset and algo you would want to have an experiment for each of them.

How would you go about doing that?

One very bad way I came up with is the following:

experiments = []
foreach dataset
    foreach algorithm
          experiments.append(Experiment('dataset + algorithm combo name'))

foreach experiment in experiments
     experiment.main
     def main():
         main code calling models and running train

Please let me know what's a better way of doing that. Thank you!

The text was updated successfully, but these errors were encountered:

JarnoRFB · 2019-02-14T08:23:29Z

Well I think your basic idea is right. However, given that you want to run the same experiment just with different configuration (model, data) you can reuse the same experiment object and just update the config. It would look something like this

ex = Experiment("generic_experiment")

@ex.main
def run(dataset, model):
    ...

for dataset in datasets:
    for model in models:
        ex.run(config_updates={"dataset": dataset, "model": model},
               options={'--name': f"{dataset}_{model}")

For a more complete example you might look at Klaus' code https://github.com/Qwlouse/Binding/blob/master/run_evaluation.py

kirk86 · 2019-02-14T16:43:32Z

@JarnoRFB thanks for the reply. In the end that's what I ended up doing but it was a bit more involved since I was using different files src.py & main.py. The src.py contains Ingredient and main.py contains the actual Experiment with the ingredients. One issue that I faced was that I was loading the config from yaml files inside the main.py uisng ex.add_config but then in the main.py I couldn't get those configs to be injected during run time, even though I had the ingredient.capture decorator in some of the methods. This bit through me off. Maybe I was doing it wrong?

The other thing that I would like to ask is how would anyone go about saving validation folds in each experiment. Should that be in different columns using _run.info? Or should each of them be a separate experiment, but then it would unnecessary populate many entries in the db. Is there a proper way to populate those folds each as a separate entry a.k.a row but under it's own experiment.

JarnoRFB · 2019-02-15T09:20:33Z

In the end that's what I ended up doing but it was a bit more involved since I was using different files src.py & main.py. The src.py contains Ingredient and main.py contains the actual Experiment with the ingredients. One issue that I faced was that I was loading the config from yaml files inside the main.py uisng ex.add_config but then in the main.py I couldn't get those configs to be injected during run time, even though I had the ingredient.capture decorator in some of the methods. This bit through me off. Maybe I was doing it wrong?

Sorry, but I cannot quite follow. A minimal code example would greatly help here.

The other thing that I would like to ask is how would anyone go about saving validation folds in each experiment. Should that be in different columns using _run.info? Or should each of them be a separate experiment, but then it would unnecessary populate many entries in the db. Is there a proper way to populate those folds each as a separate entry a.k.a row but under it's own experiment.

What exactly do you want to save from the validation fold. If it is just a metric, e.g. accuracy, why not save this into a metric of the run? You can call

_run.log_scalar("validation_fold_acc", acc)

for each validation fold.

It would also be nice if you could ask such general question under the python-sacred on stackoverflow so the answers remain more visible to the general public.

kirk86 · 2019-02-15T13:26:27Z

Sorry, but I cannot quite follow. A minimal code example would greatly help here.

I apologize for the confusion let me provide a MWE as you requested in order to make things clear
src.py

import sacred
ingred = sacred.Ingredient('default-params')
ingred.add_config('some/yaml/file')   <--- added config
ingred.add_config('another/yaml/file') <--- added config

class MyModel(object):
   @ingred.capture
    def __init__(self):
        do some stuff...

main.py

import sacred
from src import MyModel, ingred
ex = sacred.Experiment('test-exper', ingredients=[ingred])

@ex.main   <--- this also works as capture decorator
def main(param1, param2):  <--- if I pass my params they are not recognized, only if I access them through _run
    MyModel()

What exactly do you want to save from the validation fold. If it is just a metric, e.g. accuracy, why not save this into a metric of the run?

Yup that's doable but when you examine through omniboard I think it shows the validation loss of the last training epoch and not the best valid.

In other words since _run.log_scalar("validation_fold_acc", acc) always puts a counter I am not sure whether the values are appended for each count or if they are overwritten.

It would also be nice if you could ask such general question under the python-sacred on stackoverflow so the answers remain more visible to the general public.

Thanks for the pointer, wasn't aware of it, from now on I'll post related stuff there.

JarnoRFB · 2019-02-15T13:52:31Z

Yup that's doable but when you examine through omniboard I think it shows the validation loss of the last training epoch and not the best valid.
In other words since _run.log_scalar("validation_fold_acc", acc) always puts a counter I am not sure whether the values are appended for each count or if they are overwritten.

I believe that if you do not set the step explicitly, it will append to the metrics array. If you want to see the current best validation metric, you could set it as a result. While the experiment is running with

_run.result = best_validation_acc

and for the final result by returning the result value from the main function. See also https://sacred.readthedocs.io/en/latest/collected_information.html#live-information. This way it would be displayed in omniboard in the result column.

On the ingredient issue I unfortunately cannot comment without looking a bit deeper into it. I have not really used ingredients myself. But do I see it right that you want to access parameters from the ingredient config in the experiment main function?

kirk86 · 2019-02-15T14:03:48Z

On the ingredient issue I unfortunately cannot comment without looking a bit deeper into it. I have not really used ingredients myself. But do I see it right that you want to access parameters from the ingredient config in the experiment main function?

Exactly, without having to again use ex.add_config or either _run, just by accessing the params in the captured method def main(params). It seems that the params are not injected when using ingredients? Although I might be wrong!

Qwlouse · 2019-02-18T20:57:04Z

Hi @kirk86,

ingredients create their own namespace in the configuration, as if the values where part of a dictionary with the name of the ingredient. If you slightly modify your example to use a python-compatible name for the ingredient you can access it from there:

import sacred
ingred = sacred.Ingredient('default_params')
ingred.add_config('some/yaml/file')   # <--- added config
ingred.add_config('another/yaml/file') # <--- added config

class MyModel(object):
   @ingred.capture
    def __init__(self):
        pass # do some stuff...

import sacred
from src import MyModel, ingred
ex = sacred.Experiment('test_exper', ingredients=[ingred])

@ex.main   # <--- this also works as capture decorator
def main(default_params):
    param1 = default_params['param1']
    param2 = default_params['param2']
    MyModel()

kirk86 · 2019-02-18T22:52:02Z

@Qwlouse
thanks a lot. That's great!
I'll close the issue for now to keep things clean.

pedropalb · 2021-10-05T18:27:37Z

Well I think your basic idea is right. However, given that you want to run the same experiment just with different configuration (model, data) you can reuse the same experiment object and just update the config. It would look something like this
ex = Experiment("generic_experiment")

@ex.main
def run(dataset, model):
    ...

for dataset in datasets:
    for model in models:
        ex.run(config_updates={"dataset": dataset, "model": model},
               options={'--name': f"{dataset}_{model}")
For a more complete example you might look at Klaus' code https://github.com/Qwlouse/Binding/blob/master/run_evaluation.py

@JarnoRFB, I came up with a similar solution but I’m having problems passing the dataset as an argument for the config_updates. Since I’m using the MongoObserver, the whole dataset is being saved to the MongoDB.

Is there any way to pass data to the ex.run() method without touching the config? As a workaround, I thought to use a global variable to hold the dataset reference but I wonder there is a more elegant solution. Maybe a way to, at least, tell the MongoObserver to ignore some config entries.

JarnoRFB · 2021-10-05T19:33:31Z

@pedropalb Sorry not quite sure what you mean. I guess in the example I meant dataset to represent a reference to dataset, e.g. a string identifying the dataset or a path to the data. Otherwise all datasets need to loaded in memory upfront. As you pointed out, putting an instantiated dataset in the config is not great, but I think this should be the case irrespective of the observer used.

pedropalb · 2021-10-05T20:49:22Z

@JarnoRFB I see! I misunderstood your dataset variable.

I’ve been using the dataset path as a config entry. But now I need to run multiple times with the same dataset. Passing the dataset path is not an option anymore since I would have to load it from the data path and preprocess it in every single call to the ex.run.

I need a way to pass the same loaded and preprocessed dataset to multiple ex.run calls. The workaround I commented previously is to have a global variable to hold this preloaded dataset:

ex = Experiment("generic_experiment")

dataset = None

@ex.command
def train(model):
    global dataset
    ...

@ex.commnad
def run(dataset_paths, models):

    for dataset_path in dataset_paths:
        global dataset
        dataset = load_dataset(dataset_path)
   
        for model in models:
            ex.run('train', config_updates={"model": model}, options={'--name': f"{dataset}_{model}")

But I’m wondering if there is a better and elegant way to do it.

Thanks!

kirk86 closed this as completed Feb 18, 2019

Qwlouse mentioned this issue Feb 19, 2019

too much magic - what would it take to have an object oriented interface? #193

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

best practices for structuring nested experiment runs #416

best practices for structuring nested experiment runs #416

kirk86 commented Feb 13, 2019

JarnoRFB commented Feb 14, 2019

kirk86 commented Feb 14, 2019 •

edited

Loading

JarnoRFB commented Feb 15, 2019

kirk86 commented Feb 15, 2019

JarnoRFB commented Feb 15, 2019

kirk86 commented Feb 15, 2019

Qwlouse commented Feb 18, 2019

kirk86 commented Feb 18, 2019

pedropalb commented Oct 5, 2021 •

edited

Loading

JarnoRFB commented Oct 5, 2021

pedropalb commented Oct 5, 2021 •

edited

Loading

best practices for structuring nested experiment runs #416

best practices for structuring nested experiment runs #416

Comments

kirk86 commented Feb 13, 2019

JarnoRFB commented Feb 14, 2019

kirk86 commented Feb 14, 2019 • edited Loading

JarnoRFB commented Feb 15, 2019

kirk86 commented Feb 15, 2019

JarnoRFB commented Feb 15, 2019

kirk86 commented Feb 15, 2019

Qwlouse commented Feb 18, 2019

kirk86 commented Feb 18, 2019

pedropalb commented Oct 5, 2021 • edited Loading

JarnoRFB commented Oct 5, 2021

pedropalb commented Oct 5, 2021 • edited Loading

kirk86 commented Feb 14, 2019 •

edited

Loading

pedropalb commented Oct 5, 2021 •

edited

Loading

pedropalb commented Oct 5, 2021 •

edited

Loading