Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mix-matching configs to create experiment configuration #406

Closed
achigeor opened this issue Feb 10, 2020 · 4 comments
Closed

Mix-matching configs to create experiment configuration #406

achigeor opened this issue Feb 10, 2020 · 4 comments

Comments

@achigeor
Copy link

achigeor commented Feb 10, 2020

Hi @omry,

Thanks for this awesome framework and work!

I'd like to explain my usecase and get your input if that's possible, since after reading the docs, tutorials and issues, it's still a bit unclear to me.

Currently we use yaml configs to setup our pytorch DL experiments. Each experiment has a config file, where everything is defined with the goal of writing as little code as possible. For example, one can define arbitrary torchvision transformations to be composed, e.g.:

  transforms:
    - Normalize:
          mean: [M1, M2, M3]
          std: [S1, S2, S3]
 ...

or datasets to use, etc.

Currently these configs have a lot of repeated information, which makes hydra the perfect tool to use.
What I am aiming for is an architecture like the following:

├── hydra
│   ├── datasets     
│   │   ├── dataset1.yaml
│   │   ├── dataset2.yaml
│   ├── models
│   │   ├── model1.yaml
│   │   ├── model2.yaml
│   ├── transforms
│   │   ├── basic.yaml
│   │   ├── color.yaml
├── experiment1.yaml
├── experiment2.yaml

And then inside the yaml experiments, something like:

datasets:  (ability to provide more than one)
    - dataset1
    - dataset2
model: model1
# ability to somehow override any params you want from model1.yaml
transforms: 
    - basic
    - # extend with more transforms defined here

Finally, the end user would just run python main.py --config experiment1.yaml.

Is this something that could in principle be achieved with Hydra now, or in the future?
Or am I approaching this the wrong way?

Would love to hear your thoughts on this.

Thanks!

EDIT: Maybe related to #171?

@omry
Copy link
Collaborator

omry commented Feb 10, 2020

Hi @achigeor!
Thanks for the kind words.
What you are trying to do is not directly possible with Hydra right now, you are hitting a couple of limitations:

  1. composition is always on the root node, meaning datasets would overwrite one another if you are following the standard convention have rooting them in the same dataset node. (config groups are also mutually exclusive so only one element from the dataset config group can normally be composed in).
dataset:
  name: dataset1
  path: /foo/dataset1

There is a feature request to address this, at least in part by adding the ability to specifying the node path for merging a config group. see #235.

However, on the surface the need to specify more than dataset hints at a possible modeling problem:
Are you actually going to use more than one dataset at the same time? if not - you should probably use multirun instead.

As you have pointed out with the edit, more powerful composition will be enabled when (if) #171 is implemented.

Generally, lists (like in your transformers examples) are tricky to compose. currently Hydra will replace a list.
What may work is something like:

transformers:
  scale:
    ...
  translate:
    ...

transform:
  - ${transformers.scale}
  - ${transformers.translate
  ...

This way the list stays minimal and easy to define while the repeated information is stored only once.
I hope this al makes sense.

Feel free to join the chat, it feels like this is better done interactively.

@achigeor
Copy link
Author

Thanks for the quick reply @omry!

I will go through your answer in more detail and move this to the chat as you suggested.

Some quick comments regarding your point on datasets. Indeed, we are using more than one datasets the same time, leveraging ConcatDataset functionality of pytorch.

@omry
Copy link
Collaborator

omry commented Feb 10, 2020

Thanks for clarifying about the datasets.
This probably suggests that you want to have your dataset defined in different nodes.
Maybe a pattern similar to what I suggested for transformers would work there too.

@omry
Copy link
Collaborator

omry commented Feb 11, 2020

Happy to discuss further in the chat. At least for now it looks like this is already covered by the two other enhancement requests I mentioned so I am closing this.

@omry omry closed this as completed Feb 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants