Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow configuring the search path via the config file #274

Closed
omry opened this issue Nov 6, 2019 · 30 comments · Fixed by #1450
Closed

Allow configuring the search path via the config file #274

omry opened this issue Nov 6, 2019 · 30 comments · Fixed by #1450
Labels
enhancement Enhanvement request
Milestone

Comments

@omry
Copy link
Collaborator

omry commented Nov 6, 2019

Design doc outlining consideration and possible implementations:

https://docs.google.com/document/d/1Dx77SThg1ugnGHvZ8Dq1FQusXDvGn0tyyWBeJ3nXvro/edit?usp=sharing

@omry omry added the enhancement Enhanvement request label Nov 6, 2019
@omry omry modified the milestone: 0.11.0 Nov 6, 2019
@asherp
Copy link

asherp commented Dec 20, 2019

Not sure if this is the feature I'm waiting for, but I want my users to install my app and easily point to a new config directory that overrides my defaults. My app is based on the hydra_app_example. Please advise.

@asherp
Copy link

asherp commented Dec 20, 2019

Ok, I think I figured it out. First add a default search path to config.yaml:

search_path: config.yaml

Then merge it using OmegaConf

import hydra
from omegaconf import OmegaConf

@hydra.main(config_path='conf/config.yaml', strict = False)
def main(cfg):
    if cfg.search_path is not None:
        override_path = hydra.utils.to_absolute_path(cfg.search_path)
        override_conf = OmegaConf.load(override_path)
        cfg = OmegaConf.merge(cfg, override_conf)

Now the user can run your app from any directory containing a config.yaml, or specify their own:

myapp search_path=path/to/user/config.yaml

@omry
Copy link
Collaborator Author

omry commented Dec 20, 2019

That's a pretty good workaround. I would call this something like config_override and not search_path, as search path is an internal concept in Hydra that will eventually be exposed.

Asher, can you share a bit about your project? I am curious about how people are using Hydra and for what.

@asherp
Copy link

asherp commented Dec 22, 2019

@omry It's for a project called Kamodo, which is a functional, publication-focused API for space weather models and data (though it can be used for much more!). Kamodo handles function composition, unit conversion, and automatic plot generation via function inspection, which should minimize the effort needed to create analysis pipelines and perform data-model comparisons. Kamodo is built with sympy and plotly.

Most of our science users are comfortable with command-line but are inexperienced with modern programing languages. The CLI for kamodo will allow them to configure their data sources and any equations they want to use for post processing and quickly generate plots. Our users will often test different models or data sources for science relevance and will want to easily reproduce those results, so Hydra's ability to compose several overlapping configurations and have timestamped output directories is perfect for our use case.

Github - https://github.com/nasa/Kamodo
Official site - https://ccmc.gsfc.nasa.gov/Kamodo/

I'm also considering Hydra for my other project Hourly, which uses git "clock in" and "clock out" messages to track hours worked, generate time sheets, and produce work charts. Since I started freelancing, I've logged at least 1000 sessions across 5 different projects using Hourly.

Currently, Hourly's cli is based on click, but I need to provide default functionality specific to a particular repo (to ignore errant clock-in/outs, etc). I'd also like to generate time sheets from related projects, so config composition would come in handy. Finally, I'm usually the only one who uses hourly, but eventually I'll need to iterate over all contributors to a project (assuming I get more users!).

Project page - https://asherp.github.io/hourly/index.html
Github - https://github.com/asherp/hourly

@omry
Copy link
Collaborator Author

omry commented Dec 22, 2019

Kamodo looks awesome, and I am really happy to start seeing serious projects that are making use of Hydra.

A few notes:

  • Can you add Hydra as a dependency of Kamodo (in setup.py/cfg or requirements.txt). this will allow GitHub to detect it as a project using Hydra.
  • Did you see the new compose API? it lets you compose configs with Hydra anywhere, not just in the main. This is specifically useful for Jupyter notebooks. here is an example (you can launch the binder to play with it online).
  • Take a look at Pytorch, it might be very useful for Kamodo as an alternative to numpy. It allows highly optimized vector/matrix operations for both CPU and GPU, as well as automatic differentiation.
  • Hourly looks like a nice and useful project for freelancers, it would be great if you use Hydra there as well, and I would love to know about your experience porting from Click to Hydra. Click is super popular but I think Hydra is much more powerful and it can probably be used as a replacement for Click for most use cases.
  • I bet NASA got some serious compute clusters. While there are no official public plugins for it yet, we are using Hydra at FAIR to launch to our internal cluster. Hydra has a plugin mechanism that can support any cluster (including private clusters with non public APIs).
    I would be happy to guide you in writing a plugin that will allow you to use Hydra to launch to your internal cluster from the command line.

I am considering creating a "Who is using Hydra" section in the docs, would you agree to write a short testimonial saying why you find it useful?

@omry
Copy link
Collaborator Author

omry commented Dec 22, 2019

One more thing:
As a framework, Kamodo might want to provide some configuration to its users.
For example, some default configurations to instantiate commonly used models etc.
The best way to do it with Hydra is to add those configs to the search path.

You can do it programmatically and automatically via the search plugin API.
I don't yet have a super clean example for this in hydra/plugins/examples, but the colorlog plugin comes very close.
Essentially this is all its doing. check it out here.

@omry omry added this to the 1.1.0 milestone Jan 31, 2020
@Stonesjtu
Copy link
Contributor

Stonesjtu commented Jun 9, 2020

I have the similar use cases as @asherp .

  • The major interface of my program is in binary form, most users don't write any python codes.
  • The default value is defined in structured config rather than a default config file
  • Users write their own config file in their work dir, and runs through python -m xxx.main

@omry
I think the current version 1.0rc supports specify config_path, but the path is resolved relative to the main.py file, so I have to specify absolute path to make it work.

Can we resolve the config_path relative to the $pwd, which makes more sense in CLI cases

@omry
Copy link
Collaborator Author

omry commented Jun 9, 2020

You can probably achieve that now via a config searchpath plugin.
example.

@omry
Copy link
Collaborator Author

omry commented Jun 9, 2020

by the way, take a look at the Application Packaging page.

  1. You CAN use config files with your application even if it's installed.
  2. You can get a native command to execute it instead of the clunky python -m module.

@odelalleau
Copy link
Collaborator

I have a related use case I'm not sure can be supported easily right now:

I would like to have an application coming with its own config file (packaged with the app, not to be edited), that would also let the user override some parameters through their own config file (stored in the directory where they use the app).

I haven't looked closely yet at the config search path plugin, but reading the doc it sounds like it's going to stop at the first matching file (instead of merging configurations from multiple files).
It looks like I could use the compose API but that would require some extra manual work.

Just curious if there's an option I'm not seeing, or something I misunderstood?

@jieru-hu
Copy link
Contributor

jieru-hu commented Nov 3, 2020

I have a related use case I'm not sure can be supported easily right now:
I would like to have an application coming with its own config file (packaged with the app, not to be edited), that would also let the user override some parameters through their own config file (stored in the directory where they use the app).
I haven't looked closely yet at the config search path plugin, but reading the doc it sounds like it's going to stop at the first matching file (instead of merging configurations from multiple files).
It looks like I could use the compose API but that would require some extra manual work.
Just curious if there's an option I'm not seeing, or something I misunderstood?

It seems like something Hydra may already support? As an example, all of Hydra's plugin gets their config added by SearchPathPlugin, example here: https://github.com/facebookresearch/hydra/blob/master/plugins/hydra_colorlog/hydra_plugins/hydra_colorlog/colorlog.py#L6-L9
I imagine in your case you can create multiple SearchPathPlugins pointing all the configs you are interested in and have Hydra merge them all.
It would be helpful if you can provide a minimal example of what the config structure would look like, that way I may better help you :)

@odelalleau
Copy link
Collaborator

I imagine in your case you can create multiple SearchPathPlugins pointing all the configs you are interested in and have Hydra merge them all.

Maybe -- I haven't looked closely at it, the reason why I thought it wouldn't work is because the doc says "When a config is requested, The first matching config in the search path is used", suggesting that there is no merging (just use whichever is found first). But I may not be understanding properly.

As an example I may want to have a directory structure like this:

├── my_app
│   └── config
│       └── config.yaml
└── user_code
    └── .my_app
        └── config.yaml

And I would like both config files to be merged (in my case the config found under "user_code" would only modify keys already existing in the "my_app" config, not add new keys).

I managed to make it work with the Compose API but (1) I had to drop @hydra.main(), and (2) it's a bit cumbersome because I need to declare my defaults list in the "user_code" config even if I would like to reuse the one from "my_app". So overall not so convenient...

@omry
Copy link
Collaborator Author

omry commented Nov 3, 2020

Configuring the defaults list (the topic of this issue) has little to do with your question.

First of all, if you have two configs in the same name and the same group, only one will be visible.
This is not changing (and in fact enabling replacing an existing be prepending a searchpath containing a config with the same name.

In 1.0, you only get one defaults list - in the primary config.
That primary config can refer to a file from your app:

library/conf
  app_config.yaml
  ...

user_app/conf
  user_overrides.yaml # contains things the user want to override in app_config.
  user_conf.yaml 
    defaults:
      - app_config
      - user_overrides  # this is needed by default the composition order is placing this file before things in the defaults list.

The chunkiness in this solution will be addressed once recursive default supports coming in 1.1.
Refer to this design doc to see what's coming. Feedback is welcome.

@odelalleau
Copy link
Collaborator

Configuring the defaults list (the topic of this issue) has little to do with your question.

It may not have been what you had in mind when you created this issue, but #274 (comment) is pretty close to what I want to achieve.

The chunkiness in this solution will be addressed once recursive default supports coming in 1.1.
Refer to this design doc to see what's coming. Feedback is welcome.

I had a look, I may not understand it all, but I'm not sure it would work for my use case.

To make things more concrete, let's take this example from Hydra docs: https://hydra.cc/docs/next/tutorials/structured_config/static_schema
Let's say I have a packaged app with config files as shown in this example. I would like to let a user of my app create their own config file(s) to override only some of these settings. For instance they could create a file .my_app/config/db/staging.yaml that would contain:

host: mysql007.staging

And then by running python /path/to/my_app.py --config-dir .my_app/config, the app would run with the user-specified host when using the staging db, instead of the default one.

The reason why I'm doubtful about recursive defaults enabling this is because of this example in the doc you shared:

defaults:
 - d2/model: resnet50  # model from an installed framework
 - _self_
 - db: mysql

  # This will override the value provided by the framework

d2:
  model:
    num_layers: 100

Here, num_layers will be set to 100 regardless of which model is used, but what if I want to set it to 100 for resnet50 and to 200 for resnet101?

@odelalleau
Copy link
Collaborator

Just an additional thought: if there's a way to do it via config files it'd make sense to also be able to do it via a command line override. Something like:

python /path/to/my_app.py db/staging.host=mysql007.staging

@omry
Copy link
Collaborator Author

omry commented Nov 3, 2020

The reason why I'm doubtful about recursive defaults enabling this is because of this example in the doc you shared:
That example is just one of many ways to use recursive defaults.

Here is another example from the docs of the next version (both the logic this is documenting and the docs themselves are work in progress):
https://hydra.cc/docs/next/advanced/defaults_list#config-inheritance-via-composition

@omry
Copy link
Collaborator Author

omry commented Nov 3, 2020

Please move the discussion to a dedicated issue.
It has nothing at all to do with this issue.

@odelalleau
Copy link
Collaborator

Please move the discussion to a dedicated issue.
It has nothing at all to do with this issue.

Sorry for the hijack :) Thanks for the pointers, I think I have a better understanding now of how things currently are and how they will be in the future. I believe I can solve my problem with a slightly different approach which makes more sense in my situation, so I won't open another issue.

@npuichigo
Copy link

npuichigo commented Jan 16, 2021

@omry After reading the design doc, especially the new feature __self__ which changes the default semantics of composition order, I think it's more user-friendly. For example, dl researchers can easily configure their experiments using this pattern

My question related to this proposal is, how can we combine that with allowing users to provide another config file/group when hydra-based framework is installed.

For example, we have some configuration files installed with our hydra app, and the default configuration is:

defaults:
  - dataset: mnist
  - model: vanilla
  - trainer: basic_cv_trainer

When the user has installed our framework, he/she can follow the doc to run the framework with pre-defined configurations or providing override command line arguments.

Now the user wants to arrange his/her experiments in his own repo using overriding configs and the structure is like this:

.
├── cv
│   ├── alexnet
│   │   └── configs
│   │       ├── big_network.yaml
│   │       ├── medium_network.yaml
│   │       └── small_network.yaml
│   └── vgg
│       ├── big_network.yaml
│       ├── medium_network.yaml
│       └── small_network.yaml
└── nlp
    └── bert
        ├── version_1.yaml
        ├── version_2.yaml
        └── version_3.yaml

Here the single yaml file can be a folder to override the default settings.

According to what I learned here, the overriding yaml file (nlp/bert/version_1.yaml) should be like this:

# @package _global_
default:
  - overrider /dataset: swag_dataset
  - override /model: bert
  - override /trainer: bert_trainer
 
model:
  bert:
      d_model: 384

How can I select different overriding configs when running different tasks? Maybe add that config to the search path?

@omry
Copy link
Collaborator Author

omry commented Jan 16, 2021

Hi @npuichigo!

@omry After reading the design doc, especially the new feature __self__ which changes the default semantics of composition order, I think it's more user-friendly. For example, dl researchers can easily configure their experiments using this pattern

That design doc is not the latest one, this is. :).
Specifically, the design if _self_ did not change between the docs.
There is no need to refer to the design doc though, the new defaults list is documented here.

My question related to this proposal is, how can we combine that with allowing users to provide another config file/group when hydra-based framework is installed.

How can I select different overriding configs when running different tasks? Maybe add that config to the search path?

The Config search path is similar to the PYTHONPATH or the Java CLASSPATH.
There is some minimal (and insufficient) documentation of it here.

Manipulation of the config search path should be done in order to add new roots to the search path.

It seems like your question is more about how to use configs that are in arbitrary config groups.

Please go ahead and read the following pages:

I think between those pages you should have enough information to do what you want.

While you can use the config search path add different config roots and thus slightly change the config structure, I suggest that you try to leverage the default package when organizing your config (the default package marches the config group).

@npuichigo
Copy link

npuichigo commented Jan 16, 2021

Thanks for ur reply @omry.

While you can use the config search path add different config roots and thus slightly change the config structure, I suggest that you try to leverage the default package when organizing your config (the default package marches the config group).

Here I already make the installed configs of my framework march the config group, for example:

framework
├── config.yaml
├── dataset
│   ├── mnist.yaml
│   └── swag.yaml
├── model
│   ├── bert.yaml
│   └── vanilla.yaml
└── trainer
    ├── basic_cv_trainer.yaml
    └── bert_trainer.yaml

These part are aligned with the framework codes, so users have no direct access to the configs. In order to override the pre-installed config, one way is to provide command line args:

python hydra-app.py model=bert model.d_model=384 trainer=bert_trainer trainer.max_steps=1000

It's a little strange to record different command line args for different experiments. So they may provide another config file to override them.

user_experiments
├── cv
   ├── alexnet
    │   └── configs
    │       ├── version_1.yaml
    │       ├── version_2.yaml

Here the user configs play a role as the command line arguments.

python hydra_app.py --user_config=cv/alexnet/configs/version_1.yaml

# when version_1.yaml contains the following content, the cmd is the same as
# python hydra-app.py model=bert model.d_model=384 trainer=bert_trainer trainer.max_steps=1000

# @package _global_
default:
  - override /model: bert
  - override /trainer: bert_trainer
 
model.d_model: 384
trainer.max_steps: 1000

So I still need to deal with case of two config sources (pre-installed configs / user overrides).

@omry
Copy link
Collaborator Author

omry commented Jan 16, 2021

  1. --user_config is not a Hydra flag.
  2. As far as I understand, your problem is not related to this issue.
    Using existing issues for support is bad practice, it sends notifications and emails to all the subscribers.
    If you have further questions please open a dedicated issue.

@npuichigo
Copy link

Sorry. Just find several guys above are talking about the same thing in this thread.

@omry
Copy link
Collaborator Author

omry commented Jan 16, 2021

No worries.
Feel free to continue the discussion in a different issue if need help.
I need to ask clarifying questions to continue helping and I don't want to do it on this issue for the reasons I mentioned above.

@omry
Copy link
Collaborator Author

omry commented Mar 3, 2021

I updated the description of the issue with a design doc.
Please check it out. Feel free to provide feedback on the document (comments are open).

@omry omry mentioned this issue Mar 4, 2021
@omry omry closed this as completed in #1450 Mar 5, 2021
@summelon
Copy link

Regarding previous discussions:
I'm working on a case of high-dependent configs.
I want to decide my YAML group according to my arch.

Is the case below possible?
/configs/common/base.yaml:

defaults:
  - arch: ???
  - _self_
  # - other configs in /configs/${arch}
hydra:
  searchpath:
    - pkg://configs/${arch} # The name of arch in defaults list
    # or: - pkg://configs/${arch.name} # The name in arch config group

/configs/common/arch/arch1.yaml:

name: foo
_target_: bar

In a nutshell, is the interpolation in the search path(based on the default list or the final config) may add in the future?
Or is there already a workaround about this?

@omry
Copy link
Collaborator Author

omry commented Apr 30, 2021

I don't think it's possible.
There is a have a circular dependency between deciding what the default list is and deciding what is the config search path.
The hydra.searchpath node is special. This is also the reason why it is only allowed in the primary config.

Try to leverage default list interpolation instead.

defaults:
  - arch: ???
  - _self_
  - foo/bar: ${arch}

@summelon
Copy link

Thanks for your reply!

I am confused about the position of the config search path.

  • First of all, since it uses the interpolation from the final Config Object (its initial status?), it seems like a member of the final Config Object.
  • It is only used in the primary config and initialized at the very beginning. From this perspective, it looks like the one in the default list.

In conclusion, why it cannot use arguments from the command line as its interpolation?
Some situation like:

defaults:
  - arch: ???
  # - other configs in /configs/${arch} may also available since I parsed arch=foo in the command line
hydra:
  searchpath:
    - pkg://configs/${arch}

python3 main.py arch=foo

@omry
Copy link
Collaborator Author

omry commented Apr 30, 2021

First of all, since it uses the interpolation from the final Config Object (its initial status?), it seems like a member of the final Config Object.

My example does not have hydra.searchpath at all.

It is only used in the primary config and initialized at the very beginning. From this perspective, it looks like the one in the default list.

I have no idea what you are talking about.

If you have following questions please create a new issue. closed issues are not a channel to get support.

@summelon
Copy link

Ok, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhanvement request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants