Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hydra: plugins_path and advanced config #5097

Merged
merged 5 commits into from
Feb 2, 2024
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
105 changes: 102 additions & 3 deletions content/docs/user-guide/experiment-management/hydra-composition.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ supports Hydra's [config composition] as a way to configure [experiment runs].

<admon type="info">

At the moment you must explicitly enable this feature with:
You must explicitly enable this feature with:

```cli
$ dvc config hydra.enabled True
Expand Down Expand Up @@ -139,8 +139,9 @@ We parametrize the shell commands above (`mkdir`, `tar`, `wget`) as well as

<admon type="tip">

You can use `dvc.api.params_show()` to load params in Python code. For other
languages, use [dictionary unpacking] or a YAML parsing library.
You can load the params with any YAML parsing library. In Python, you can use
the built-in `dvc.api.params_show()` or `OmegaConf.load("params.yaml")` (which
comes with Hydra).

[dictionary unpacking]:
/doc/user-guide/project-structure/dvcyaml-files#dictionary-unpacking
Expand Down Expand Up @@ -221,4 +222,102 @@ Stage 'train' didn't change, skipping

</admon>

`dvc exp run` will compose a new `params.yaml` each time you run it, so it is
not a reliable way to reproduce past experiments. Instead, use `dvc repro` when
you want to reproduce a previously run experiment.

[debug]: /doc/user-guide/pipelines/running-pipelines#debugging-stages

## Migrating Hydra Projects

If you already have Hydra configured and want to start using DVC alongside it,
you may need to refactor your code slightly. DVC will not pass the Hydra config
to `@hydra.main()`, so it should be dropped from the code. Instead, DVC composes
the Hydra config before your code runs and dumps the results to `params.yaml`.

Using the example above, here's how the Python code in `train.py` might look
using Hydra without DVC:

```python
import hydra
from omegaconf import DictConfig

@hydra.main(version_base=None, config_path="conf", config_name="config")
def main(cfg: DictConfig) -> None:
# train model using cfg parameters

if __name__ == "__main__":
main()
```

To convert the same code to use DVC with Hydra composition enabled:

```python
from omegaconf import OmegaConf

def main() -> None:
cfg = OmegaConf.load("params.yaml")
AlexandreKempf marked this conversation as resolved.
Show resolved Hide resolved
# train model using cfg parameters

if __name__ == "__main__":
main()
```

You no longer need to import Hydra into your code. A `main()` method is included
in this example because it is good practice, but it's not necessary. This
separation between config and code can help debug because the entire config
generated by Hydra gets written to `params.yaml` before the experiment starts.
You can also reuse `params.yaml` across multiple scripts in different stages of
a DVC pipeline.

## Advanced Hydra config

You can configure how DVC works with Hydra.

By default, DVC will look for Hydra [config groups] in a `conf` directory, but
dberenbaum marked this conversation as resolved.
Show resolved Hide resolved
you can set a different directory using `dvc config hydra.config_dir other_dir`.
This is equivalent to the `config_path` argument in `@hydra.main()`.

Within that directory, DVC will look for [defaults list] in `config.yaml`, but
you can set a different path using `dvc config hydra.config_name other.yaml`.
This is equivalent to the `config_name` argument in `@hydra.main()`.

Hydra will automatically discover [plugins] in the `hydra_plugins` directory. By
default, DVC will look for `hydra_plugins` in the root directory of the DVC
repository, but you can set a different path with
`dvc config hydra.plugins_path other_path`.

### Custom resolvers

You can register [OmegaConf custom resolvers] as plugins by writing them to a
file inside `hydra_plugins`. DVC will use these custom resolvers when composing
the Hydra config. For example, add a custom resolver to
`hydra_plugins/my_resolver.py`:

```python
import os
from omegaconf import OmegaConf

OmegaConf.register_new_resolver('join', lambda x, y : os.path.join(x, y))
```

You can use that custom resolver inside the Hydra config:

```yaml
dir: raw/data
relpath: dataset.csv
fullpath: ${join:${dir},${relpath}}
```

The final `params.yaml` will look like:

```yaml
dir: raw/data
relpath: dataset.csv
fullpath: raw/data/dataset.csv
```

[plugins]:
https://hydra.cc/docs/advanced/plugins/develop/#automatic-plugin-discovery-process
dberenbaum marked this conversation as resolved.
Show resolved Hide resolved
[OmegaConf custom resolvers]:
https://omegaconf.readthedocs.io/en/latest/custom_resolvers.html
5 changes: 5 additions & 0 deletions content/docs/user-guide/project-structure/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -258,12 +258,17 @@ Composition].
groups]. Defaults to `conf`.
- `hydra.config_name` - the name of the file containing the Hydra [defaults
list] (located inside `hydra.config_dir`). Defaults to `config.yaml`.
- `hydra.plugins_path` - location of the parent directory of `hydra_plugins`,
where Hydra will automatically discover [plugins]. Defaults to the root of the
DVC repository.

[config composition]:
https://hydra.cc/docs/tutorials/basic/your_first_app/composition/
[config groups]:
https://hydra.cc/docs/tutorials/basic/your_first_app/config_groups/
[defaults list]: https://hydra.cc/docs/tutorials/basic/your_first_app/defaults/
[plugins]:
https://hydra.cc/docs/advanced/plugins/develop/#automatic-plugin-discovery-process

</details>

Expand Down
Loading