Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hydra composition: Workflow discrepancies #8355

Open
daavoo opened this issue Sep 23, 2022 · 11 comments
Open

hydra composition: Workflow discrepancies #8355

daavoo opened this issue Sep 23, 2022 · 11 comments
Labels
A: experiments Related to dvc exp discussion requires active participation to reach a conclusion

Comments

@daavoo
Copy link
Contributor

daavoo commented Sep 23, 2022

When using Hydra Composition for configuring DVC experiments, there are a few discrepancies with respect to the "regular" params workflow. This could cause confusion to existing users of dvc params when migrating to Hydra Composition:

  • The latest state is not preserved

Without Hydra Composition, if modifications are done via --set-param and then the experiment is persisted, the next exp run with no arguments will reuse the latest modifications applied.

With Hydra Composition, the next exp run with no arguments will still run the composition and dump to params.yaml, overriding the latest --set-param modifications.

Users would need to manually edit the files in hydra.config_dir and/or the default list values in hydra.config_name in order to reflect the latest modifications via --set-param.

  • Source of configuration needs to be tracked separately

Without Hydra Composition, tracking the params is enough. A change in params file would result in a new experiment.

With Hydra Composition, tracking only params.yaml could result in unexpected behavior, as manual modifications to files in hydra.config_dir would not be detected by DVC.

Users would need to also track hydra.config_dir.

@daavoo daavoo added discussion requires active participation to reach a conclusion A: experiments Related to dvc exp labels Sep 23, 2022
@daavoo
Copy link
Contributor Author

daavoo commented Sep 23, 2022

Not sure how critical or relevant these 2 points are. Perhaps is only required to acknowledge them in the docs, wdyt @dberenbaum

@dberenbaum
Copy link
Collaborator

  • The latest state is not preserved

Agree that this could be confusing and should be pointed out. However, IMHO this ephemeral state is more expected and useful for experimentation. It pretty much reflects how queued experiments work and how Hydra users work already. The way DVC handles workspace experiments is arguably an odd exception. I find it mostly annoying that I can't return to some default state after each experiment, so I personally wouldn't push to preserve state for Hydra experiments.

  • Source of configuration needs to be tracked separately

Can you think of an example where modifications would not be tracked by DVC? I thought they would all end up being tracked if they impact the experiment (either in params.yaml or in dvc.yaml vars).

@dberenbaum
Copy link
Collaborator

Encountered a user confused by this behavior in https://discord.com/channels/485586884165107732/563406153334128681/1128718359878451360

@gromag
Copy link

gromag commented Jul 15, 2023

I'm porting last conversation of Discord here so that we can carry on discussing it on this forum as suggested by @dberenbaum .

I have hydra.enabled = true and I'm familiar with how the params.yaml file is populated from the hydra-defined values when dvc exp run is called. However, after an experiment that was run with some parameters overwritten using the --set-param flag, if I run dvc exp apply <exp-name>, the experiments settings are not applied back to the hydra-defined config files. Therefore if I git / dvc commit the workspace and then later on go back to this same branch and run dvc exp run the experiment that was applied to that branch is no longer reproduced, but an old experiment with the old hydra values is. Would I need to manually update the hydra-defined values after a dvc exp apply command before committing to git/dvc if the experiment was run with --set-param, that seems odd?

@dberenbaum
Copy link
Collaborator

However, after an experiment that was run with some parameters overwritten using the --set-param flag, if I run dvc exp apply <exp-name>, the experiments settings are not applied back to the hydra-defined config files.

Just to clarify, the experiments settings are applied to your params.yaml file, but they are not populated in the conf directory of files used by the hydra "compose and dump." Therefore, it is possible to reproduce any applied experiment like this:

$ dvc exp apply <exp-name>
$ dvc hydra.enabled = false
$ dvc exp run

It seems odd that you need to disable hydra to reproduce the experiment, but it is consistent with how hydra command-line overrides work. We could at least document this behavior better so that it's clear how it differs from the typical dvc workflow.

@dberenbaum
Copy link
Collaborator

See https://discord.com/channels/485586884165107732/485596304961962003/1183845968089726976. Suggested there that we could add an option in dvc exp run to disable hydra composition temporarily.

@Danila89
Copy link

@dberenbaum are there any updates on this useful feature?

@dberenbaum
Copy link
Collaborator

I did take a quick look but it looks more involved to implement than I initially expected. Does the workaround mentioned above not work for you, or you just want a simpler way to do it?

@Danila89
Copy link

I've already implemented a workaround, just thought that I'll be able to replace it with some native way)
Not critical for me

@dberenbaum
Copy link
Collaborator

You can also use dvc repro instead of dvc exp run in this case, which will reproduce the experiment without doing any hydra composition. This is a key difference between repro (intended for reproduction) and exp run (intended to run some modified experiment). I'm not sure it's worth having another way to do this, but we should document these nuances of hydra composition and exp run so expectations are clear.

@gregstarr
Copy link

gregstarr commented Oct 4, 2024

I think of dvc repro sort of like "make", in fact this same comparison is made in the docs https://dvc.org/doc/start/data-pipelines, https://dvc.org/doc/start/data-pipelines/data-pipelines. It gives me peace of mind when I run dvc status and it says "all pipelines are up to date".

It feels counterintuitive to me that you change parameters within the hydra config directory structure, that dvc repro will say that everything is up to date. Also, applying an experiment to your workspace resulting in params.yaml being different from the hydra config sounds like an accident waiting to happen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: experiments Related to dvc exp discussion requires active participation to reach a conclusion
Projects
None yet
Development

No branches or pull requests

5 participants