Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support .cfg extension in parameters files #7122

Closed
gitdoluquita opened this issue Dec 10, 2021 · 4 comments
Closed

Support .cfg extension in parameters files #7122

gitdoluquita opened this issue Dec 10, 2021 · 4 comments
Labels
A: params Related to dvc params enhancement Enhances DVC p3-nice-to-have It should be done this or next sprint

Comments

@gitdoluquita
Copy link

Hey there folks!

I'm using DVC for a Spacy project and, since v3 of Spacy, all the params of the model are stored in a config.cfg file, as explained here and here.
I thought that would be nice to use the config.cfg as a parameters file, the same way as we can use a .toml or .yaml. Example:

  train:
    cmd: python -m spacy train config.cfg
    deps:
      - data/inter/train_legal_ner.spacy
      - data/inter/dev_legal_ner.spacy
    params:
      - config.cfg:
        - training.max_steps
        - training.dropout

This maybe would, even, affect the Spacy integration with DVC that right now doesn't use the params option.

I would be glad to help with any other information, and, with some tips, try to implement this .cfg parser.
Thank you in advance.

@skshetry
Copy link
Member

@gitdoluquita, does .cfg files in spacy have spacy-related interpolations? We did get a few requests before for supporting ini files, but we usually recommend to use yaml/json/toml files.

Anyway, please feel free to propose a PR. You may need to look at how we handle serialization/deserialization. As an example, you can see toml.

Also, the place where we read params:

def _read(self):

You may also need to coerce to certain types, that we need to save in yaml/json in .dvc and dvc.lock files to track.

@gitdoluquita
Copy link
Author

Thank you @skshetry

From what I know Spacy doesn't write anything in this config.cfg file. However interpolation (like variables) is possible inside of the file itself. I don't know if this could be a problem, I don't think so 🤔.

A point in Spacy documentation makes me believe that having this file compatible with DVC params is a good Idea, specially thinking in Experiments workflow and Studio's Run Experiment feature:

Reproducibility with no hidden defaults. The config file is the “single source of truth” and includes all settings.

I am not sure if I understood all the concerning points about Spacy's possible interpolation, I'm sorry.
If I missed anything besides: some possible writing in the file or some hidden value besides in-file variables (two points that by docs are discarded), please let me know.

I will take a look at those points that you said, and start to work in a PR proposal.
Thank you so much for the orientation and the attention!

@pared pared added enhancement Enhances DVC p3-nice-to-have It should be done this or next sprint labels Dec 14, 2021
@dberenbaum
Copy link
Collaborator

It seems like .cfg support would be realistic except that I'm not aware that there is a uniform standard. From what I can tell, Spacy is using https://thinc.ai/ as their configuration system. Integrating that into DVC might require pretty substantial changes, such as those proposed in #6506. We hope to look deeper into supporting complex parameter configuration like this, so it would be great to hear any thoughts you might have.

@daavoo daavoo added the A: params Related to dvc params label Feb 22, 2022
@percevalw
Copy link

Hi, if I may contribute to this issue, I think we don't need support for the full Thinc configuration system. Just being able to parse and update a .cfg file would be a big plus. Support for variable interpolation seems to me neither necessary nor desirable since it means making changes appear in two places, for instance when running dvc params diff.
Unlike other file parsers, python's standard .cfg parser (ConfigParser) skips comment lines which makes it impossible to restore them after a parameter update (with --set-param). This is not visible if you use the dvc api to view the parameters but I feel it is worth mentioning.
For my part, it works very well with spaCy's config files, I can submit a PR if you like.

@mattseddon mattseddon closed this as not planned Won't fix, can't repro, duplicate, stale Mar 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: params Related to dvc params enhancement Enhances DVC p3-nice-to-have It should be done this or next sprint
Projects
None yet
Development

No branches or pull requests

7 participants