Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

auto push on a repository basis #10681

Open
igordertigor opened this issue Feb 3, 2025 · 2 comments
Open

auto push on a repository basis #10681

igordertigor opened this issue Feb 3, 2025 · 2 comments
Labels
A: data-sync Related to dvc get/fetch/import/pull/push feature is a feature triage Needs to be triaged

Comments

@igordertigor
Copy link

igordertigor commented Feb 3, 2025

I run a lot of my ML workloads in short lived containers in a dedicated ML cluster. The typical workflow is like this:

  1. Prepare experiment locally, run a single, smaller epoch for testing
  2. git push && dvc push to repository
  3. Start container, git pull && dvc pull in the container
  4. Run either dvc repro or dvc exp run.
  5. git push && dvc push in the container
    More often than desirable, I forget step 5 here or I just run the git push part of it. As a result, I end up being left with a corrupted cache and I can't access the experiment's results using dvc metrics and similar.

I am aware that there are git-hooks that I can set up using dvc install. However, given that the containers are typically rather short lived, I tend to not install those either and there also is no guarantee that collaborators will remember to install the hooks. I would therefore appreciate a repository level setting in .dvc/config. I know that there is such a setting for experiments (exp.auto_push) but it doesn't seem to apply for cases where I run dvc repro.

Also, in a perfect world, this feature would be configurable on a per-host basis so that I can specify patterns on which autostage/auto_push are active like ml-container-.*).

@shcheklein
Copy link
Member

Can it be part of the container config (I mean hooks setup)?

@shcheklein shcheklein added feature is a feature A: data-sync Related to dvc get/fetch/import/pull/push p2-medium Medium priority, should be done, but less important triage Needs to be triaged and removed p2-medium Medium priority, should be done, but less important labels Feb 3, 2025
@igordertigor
Copy link
Author

Not necessarily the preferred way. In many cases, I don't have direct access to the container config. So it would be an additional step to remember. It also isn't generally possible to modify the container config in a persistent way. So the collaborators issue would somewhat remain too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: data-sync Related to dvc get/fetch/import/pull/push feature is a feature triage Needs to be triaged
Projects
None yet
Development

No branches or pull requests

2 participants