From 52f0c0dcdb727a0db157fba27aa150a80cbfba14 Mon Sep 17 00:00:00 2001 From: dberenbaum Date: Wed, 31 Jan 2024 17:43:22 -0500 Subject: [PATCH 1/4] hydra: plugins_path and advanced config --- .../hydra-composition.md | 105 +++++++++++++++++- .../project-structure/configuration.md | 7 ++ 2 files changed, 109 insertions(+), 3 deletions(-) diff --git a/content/docs/user-guide/experiment-management/hydra-composition.md b/content/docs/user-guide/experiment-management/hydra-composition.md index 87afcefee3..001535799a 100644 --- a/content/docs/user-guide/experiment-management/hydra-composition.md +++ b/content/docs/user-guide/experiment-management/hydra-composition.md @@ -5,7 +5,7 @@ supports Hydra's [config composition] as a way to configure [experiment runs]. -At the moment you must explicitly enable this feature with: +You must explicitly enable this feature with: ```cli $ dvc config hydra.enabled True @@ -139,8 +139,9 @@ We parametrize the shell commands above (`mkdir`, `tar`, `wget`) as well as -You can use `dvc.api.params_show()` to load params in Python code. For other -languages, use [dictionary unpacking] or a YAML parsing library. +You can load the params with any YAML parsing library. In Python, you can use +the built-in `dvc.api.params_show()` or `OmegaConf.load("params.yaml")` (which +comes with Hydra). [dictionary unpacking]: /doc/user-guide/project-structure/dvcyaml-files#dictionary-unpacking @@ -221,4 +222,102 @@ Stage 'train' didn't change, skipping +`dvc exp run` will compose a new `params.yaml` each time you run it, so it is +not a reliable way to reproduce past experiments. Instead, use `dvc repro` when +you want to reproduce a previously run experiment. + [debug]: /doc/user-guide/pipelines/running-pipelines#debugging-stages + +## Migrating Hydra Projects + +If you already have Hydra configured and want to start using DVC alongside it, +you may need to refactor your code slightly. DVC will not pass the Hydra config +to `@hydra.main()`, so it should be dropped from the code. Instead, DVC composes +the Hydra config before your code runs and dumps the results to `params.yaml`. + +Using the example above, here's how the Python code in `train.py` might look +using Hydra without DVC: + +```python +import hydra +from omegaconf import DictConfig + +@hydra.main(version_base=None, config_path="conf", config_name="config") +def main(cfg: DictConfig) -> None: + # train model using cfg parameters + +if __name__ == "__main__": + main() +``` + +To convert the same code to use DVC with Hydra composition enabled: + +```python +from omegaconf import OmegaConf + +def main() -> None: + cfg = OmegaConf.load("params.yaml") + # train model using cfg parameters + +if __name__ == "__main__": + main() +``` + +You no longer need to import Hydra into your code. A `main()` method is included +in this example because it is good practice, but it's not necessary. This +separation between config and code can help debug because the entire config +generated by Hydra gets written to `params.yaml` before the experiment starts. +You can also reuse `params.yaml` across multiple scripts in different stages of +a DVC pipeline. + +## Advanced Hydra config + +You can configure how DVC works with Hydra. + +By default, DVC will look for Hydra [config groups] in a `conf` directory, but +you can set a different directory using `dvc config hydra.config_dir other_dir`. +This is equivalent to the `config_path` argument in `@hydra.main()`. + +Within that directory, DVC will look for [defaults list] in `config.yaml`, but +you can set a different path using `dvc config hydra.config_name other.yaml`. +This is equivalent to the `config_name` argument in `@hydra.main()`. + +Hydra will automatically discover plugins in the `hydra_plugins` directory. By +default, DVC will look for `hydra_plugins` in the root directory of the DVC +repository, but you can set a different path with +`dvc config hydra.plugins_path other_path`. + +### Custom resolvers + +You can register [OmegaConf custom resolvers] as plugins by writing them to a +file inside `hydra_plugins`. DVC will use these custom resolvers when composing +the Hydra config. For example, add a custom resolver to +`hydra_plugins/my_resolver.py`: + +```python +import os +from omegaconf import OmegaConf + +OmegaConf.register_new_resolver('join', lambda x, y : os.path.join(x, y)) +``` + +You can use that custom resolver inside the Hydra config: + +```yaml +dir: raw/data +relpath: dataset.csv +fullpath: ${join:${dir},${relpath}} +``` + +The final `params.yaml` will look like: + +```yaml +dir: raw/data +relpath: dataset.csv +fullpath: raw/data/dataset.csv +``` + +[plugins]: + https://hydra.cc/docs/advanced/plugins/develop/#automatic-plugin-discovery-process +[OmegaConf custom resolvers]: + https://omegaconf.readthedocs.io/en/latest/custom_resolvers.html diff --git a/content/docs/user-guide/project-structure/configuration.md b/content/docs/user-guide/project-structure/configuration.md index 586d8ff4ee..d8c30501ac 100644 --- a/content/docs/user-guide/project-structure/configuration.md +++ b/content/docs/user-guide/project-structure/configuration.md @@ -258,12 +258,19 @@ Composition]. groups]. Defaults to `conf`. - `hydra.config_name` - the name of the file containing the Hydra [defaults list] (located inside `hydra.config_dir`). Defaults to `config.yaml`. +- `hydra.plugins_path` - location of the parent directory of `hydra_plugins`, + where Hydra will automatically discover [plugins]. Defaults to the root of the + DVC repository. [config composition]: https://hydra.cc/docs/tutorials/basic/your_first_app/composition/ [config groups]: https://hydra.cc/docs/tutorials/basic/your_first_app/config_groups/ [defaults list]: https://hydra.cc/docs/tutorials/basic/your_first_app/defaults/ +[config module]: + https://hydra.cc/docs/1.3/advanced/compose_api/#initialization-methods +[plugins]: + https://hydra.cc/docs/advanced/plugins/develop/#automatic-plugin-discovery-process From fea4d3be88e59b86a3383b05ae67624c0c8e7e05 Mon Sep 17 00:00:00 2001 From: dberenbaum Date: Thu, 1 Feb 2024 11:30:11 -0500 Subject: [PATCH 2/4] drop unused link --- content/docs/user-guide/project-structure/configuration.md | 2 -- 1 file changed, 2 deletions(-) diff --git a/content/docs/user-guide/project-structure/configuration.md b/content/docs/user-guide/project-structure/configuration.md index d8c30501ac..19c96586f5 100644 --- a/content/docs/user-guide/project-structure/configuration.md +++ b/content/docs/user-guide/project-structure/configuration.md @@ -267,8 +267,6 @@ Composition]. [config groups]: https://hydra.cc/docs/tutorials/basic/your_first_app/config_groups/ [defaults list]: https://hydra.cc/docs/tutorials/basic/your_first_app/defaults/ -[config module]: - https://hydra.cc/docs/1.3/advanced/compose_api/#initialization-methods [plugins]: https://hydra.cc/docs/advanced/plugins/develop/#automatic-plugin-discovery-process From 10c1b57129e26e0545a3037112b55e5ca249560b Mon Sep 17 00:00:00 2001 From: dberenbaum Date: Fri, 2 Feb 2024 09:12:32 -0500 Subject: [PATCH 3/4] add link to hydra plugins --- .../docs/user-guide/experiment-management/hydra-composition.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/docs/user-guide/experiment-management/hydra-composition.md b/content/docs/user-guide/experiment-management/hydra-composition.md index 001535799a..2402eeb4ba 100644 --- a/content/docs/user-guide/experiment-management/hydra-composition.md +++ b/content/docs/user-guide/experiment-management/hydra-composition.md @@ -282,7 +282,7 @@ Within that directory, DVC will look for [defaults list] in `config.yaml`, but you can set a different path using `dvc config hydra.config_name other.yaml`. This is equivalent to the `config_name` argument in `@hydra.main()`. -Hydra will automatically discover plugins in the `hydra_plugins` directory. By +Hydra will automatically discover [plugins] in the `hydra_plugins` directory. By default, DVC will look for `hydra_plugins` in the root directory of the DVC repository, but you can set a different path with `dvc config hydra.plugins_path other_path`. From 6f8d92101418184c853f043852a0be69a6ffd637 Mon Sep 17 00:00:00 2001 From: dberenbaum Date: Fri, 2 Feb 2024 09:37:40 -0500 Subject: [PATCH 4/4] explain you can run code with or without hydra --- .../user-guide/experiment-management/hydra-composition.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/content/docs/user-guide/experiment-management/hydra-composition.md b/content/docs/user-guide/experiment-management/hydra-composition.md index 2402eeb4ba..e6a0a6ae60 100644 --- a/content/docs/user-guide/experiment-management/hydra-composition.md +++ b/content/docs/user-guide/experiment-management/hydra-composition.md @@ -267,8 +267,8 @@ You no longer need to import Hydra into your code. A `main()` method is included in this example because it is good practice, but it's not necessary. This separation between config and code can help debug because the entire config generated by Hydra gets written to `params.yaml` before the experiment starts. -You can also reuse `params.yaml` across multiple scripts in different stages of -a DVC pipeline. +You can run the same code with or without Hydra (or DVC). You can also reuse +`params.yaml` across multiple scripts in different stages of a DVC pipeline. ## Advanced Hydra config