Mix-matching configs to create experiment configuration #406

achigeor · 2020-02-10T15:29:11Z

Thanks for this awesome framework and work!

I'd like to explain my usecase and get your input if that's possible, since after reading the docs, tutorials and issues, it's still a bit unclear to me.

Currently we use yaml configs to setup our pytorch DL experiments. Each experiment has a config file, where everything is defined with the goal of writing as little code as possible. For example, one can define arbitrary torchvision transformations to be composed, e.g.:

  transforms:
    - Normalize:
          mean: [M1, M2, M3]
          std: [S1, S2, S3]
 ...

or datasets to use, etc.

Currently these configs have a lot of repeated information, which makes hydra the perfect tool to use.
What I am aiming for is an architecture like the following:

├── hydra
│   ├── datasets     
│   │   ├── dataset1.yaml
│   │   ├── dataset2.yaml
│   ├── models
│   │   ├── model1.yaml
│   │   ├── model2.yaml
│   ├── transforms
│   │   ├── basic.yaml
│   │   ├── color.yaml
├── experiment1.yaml
├── experiment2.yaml

And then inside the yaml experiments, something like:

datasets:  (ability to provide more than one)
    - dataset1
    - dataset2
model: model1
# ability to somehow override any params you want from model1.yaml
transforms: 
    - basic
    - # extend with more transforms defined here

Finally, the end user would just run python main.py --config experiment1.yaml.

Is this something that could in principle be achieved with Hydra now, or in the future?
Or am I approaching this the wrong way?

Would love to hear your thoughts on this.

Thanks!

EDIT: Maybe related to #171?

The text was updated successfully, but these errors were encountered:

omry · 2020-02-10T15:58:00Z

Hi @achigeor!
Thanks for the kind words.
What you are trying to do is not directly possible with Hydra right now, you are hitting a couple of limitations:

composition is always on the root node, meaning datasets would overwrite one another if you are following the standard convention have rooting them in the same dataset node. (config groups are also mutually exclusive so only one element from the dataset config group can normally be composed in).

dataset:
  name: dataset1
  path: /foo/dataset1

There is a feature request to address this, at least in part by adding the ability to specifying the node path for merging a config group. see #235.

However, on the surface the need to specify more than dataset hints at a possible modeling problem:
Are you actually going to use more than one dataset at the same time? if not - you should probably use multirun instead.

As you have pointed out with the edit, more powerful composition will be enabled when (if) #171 is implemented.

Generally, lists (like in your transformers examples) are tricky to compose. currently Hydra will replace a list.
What may work is something like:

transformers:
  scale:
    ...
  translate:
    ...

transform:
  - ${transformers.scale}
  - ${transformers.translate
  ...

This way the list stays minimal and easy to define while the repeated information is stored only once.
I hope this al makes sense.

Feel free to join the chat, it feels like this is better done interactively.

achigeor · 2020-02-10T16:28:38Z

Thanks for the quick reply @omry!

I will go through your answer in more detail and move this to the chat as you suggested.

Some quick comments regarding your point on datasets. Indeed, we are using more than one datasets the same time, leveraging ConcatDataset functionality of pytorch.

omry · 2020-02-10T16:30:37Z

Thanks for clarifying about the datasets.
This probably suggests that you want to have your dataset defined in different nodes.
Maybe a pattern similar to what I suggested for transformers would work there too.

omry · 2020-02-11T07:26:27Z

Happy to discuss further in the chat. At least for now it looks like this is already covered by the two other enhancement requests I mentioned so I am closing this.

omry closed this as completed Feb 11, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mix-matching configs to create experiment configuration #406

Mix-matching configs to create experiment configuration #406

achigeor commented Feb 10, 2020 •

edited

Loading

omry commented Feb 10, 2020

achigeor commented Feb 10, 2020

omry commented Feb 10, 2020

omry commented Feb 11, 2020

Mix-matching configs to create experiment configuration #406

Mix-matching configs to create experiment configuration #406

Comments

achigeor commented Feb 10, 2020 • edited Loading

omry commented Feb 10, 2020

achigeor commented Feb 10, 2020

omry commented Feb 10, 2020

omry commented Feb 11, 2020

achigeor commented Feb 10, 2020 •

edited

Loading