Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support params in form of Gin Config files #7224

Open
macio232 opened this issue Jan 4, 2022 · 12 comments
Open

Support params in form of Gin Config files #7224

macio232 opened this issue Jan 4, 2022 · 12 comments
Labels
A: params Related to dvc params feature request Requesting a new feature

Comments

@macio232
Copy link

macio232 commented Jan 4, 2022

It would be nice to use DVC along with https://github.com/google/gin-config to handle parameters.

@efiop efiop added the feature request Requesting a new feature label Jan 5, 2022
@daavoo daavoo added the A: params Related to dvc params label Feb 22, 2022
@slevang
Copy link

slevang commented May 12, 2023

This would be amazing for my current project.

@dberenbaum
Copy link
Collaborator

Hey, have you considered using hydra? DVC has Hydra integration that should handle many of the use cases for Gin. Usually we try not to be opinionated about tooling, but in this case it was too hard to have meaningful support for complex configuration while being framework-agnostic.

@slevang
Copy link

slevang commented May 13, 2023

gin is different enough from hydra, through its direct binding of configs to code, and with the ability to pass configurable references, that moving between these frameworks is a pretty significant lift. For the same reason, I fully understand the challenges in supporting multiple tools like this from the DVC side.

What I would really like to be able to do, at a minimum, is simply track certain gin config params associated with an experiment in Iterative Studio. This would allow filtering to comparable experiments. My two thoughts on this were:

  1. extract a dict of relevant configurables from the gin config and dump to yaml as a params file as the first stage of a pipeline. Don't think this will work because DVC doesn't seem to support writing of params in the middle of the pipeline
  2. write out the same extracted params as metrics (which they aren't really, but would at least allow tracking). Don't think this will work either because DVC only supports numeric metrics (at least according to the docs, haven't tried it myself).

Otherwise, I think I'm left wrapping the whole pipeline with a script outside of the DVC CLI, where the params are parsed and dumped to a file prior to running dvc repro. Am I missing any other ways to do this?

@dberenbaum
Copy link
Collaborator

Thanks for the clarification!

  1. extract a dict of relevant configurables from the gin config and dump to yaml as a params file as the first stage of a pipeline. Don't think this will work because DVC doesn't seem to support writing of params in the middle of the pipeline

This should be possible using "top-level" params like this:

stages:
  dump_gin:
    cmd: python dump_gin.py
    deps:
      - dump_gin.py
    outs:
      - gin_params.yaml
  train:
    cmd: python train.py
    deps:
      - gin_params.yaml
    outs:
      - model.pkl
params:
  - gin_params.yaml

You could also log parameters directly from your code with dvclive.log_param().

2. write out the same extracted params as metrics (which they aren't really, but would at least allow tracking). Don't think this will work either because DVC only supports numeric metrics (at least according to the docs, haven't tried it myself).

Support for string metrics will be available next release, but it seems like option 1 is closer to what you want.

@slevang
Copy link

slevang commented May 15, 2023

Thanks @dberenbaum, that's exactly the approach I tried out after posting the other day. It seems to work locally, but I'm not picking up anything besides the default params.yaml in Studio yet.

I'm wondering if this is just because these changes are in a feature branch that I haven't yet merged to the default branch? I can't tell how exactly Studio picks up available params/metrics columns when it imports the project, but perhaps it relies on the default branch?

My dvc.yaml also specifies a directory of params instead of the single file example you give above, since I'm splitting out params by stage. Should I expect Studio to parse this correctly?

@dberenbaum
Copy link
Collaborator

My dvc.yaml also specifies a directory of params instead of the single file example you give above, since I'm splitting out params by stage. Should I expect Studio to parse this correctly?

No, a directory of params isn't supported now unfortunately. I can open an issue to track it. Are you able to to try it tracking the individual params files?

@slevang
Copy link

slevang commented May 15, 2023

Ah, got it, I will try with individual files. Thanks for opening #9452. A little more documentation around these aspects of Studio integration would be nice. I also found by trial and error that a directory of plots is supported, but not stage level plots, only top level.

@dberenbaum
Copy link
Collaborator

Ah, got it, I will try with individual files. Thanks for opening #9452. A little more documentation around these aspects of Studio integration would be nice.

Yup, you are right about that. In this case top-level params directories aren't supported in DVC either, and we need to clarify that. If it's supported in DVC, it should work in Studio.

I also found by trial and error that a directory of plots is supported, but not stage level plots, only top level.

Stage-level plots should be supported in Studio, so if you have more details, we can look into it.

@slevang
Copy link

slevang commented May 15, 2023

Stage-level plots should be supported in Studio, so if you have more details, we can look into it.

This was for a directory of .png plots. dvc plots show would pick them up just fine when listed under a stage, but they never showed up in Studio until I moved them to a top level plots entry.

@dberenbaum
Copy link
Collaborator

That looks to me to be working when trying out a simple example. If it's not working for you and you can reproduce it, could you please open an issue in https://github.com/iterative/studio-support/issues?

@slevang
Copy link

slevang commented May 18, 2023

This should be possible using "top-level" params like this:

stages:
  dump_gin:
    cmd: python dump_gin.py
    deps:
      - dump_gin.py
    outs:
      - gin_params.yaml
  train:
    cmd: python train.py
    deps:
      - gin_params.yaml
    outs:
      - model.pkl
params:
  - gin_params.yaml

Finally figured out that I needed to add cache: false and git track gin_params.yaml for this to work. dvc params diff doesn't seem to be able to identify params files in the cache?

@dberenbaum
Copy link
Collaborator

Thanks for your patience in debugging the problem. I can confirm that behavior. I'll open a separate bug report for it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: params Related to dvc params feature request Requesting a new feature
Projects
None yet
Development

No branches or pull requests

5 participants