-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Push running checkpoint to remote #6332
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good, just a couple small comments
@@ -464,6 +466,20 @@ def checkpoint_callback( | |||
exp_rev = cls.commit( | |||
scm, exp_hash, exp_name=name, force=force, checkpoint=True | |||
) | |||
|
|||
git_remote = os.environ.get("DVC_EXP_AUTO_PUSH", None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please put this env var into dvc/env.py
and use it as a var.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also what about a different meaning, e.g. DVC_EXP_AUTO_PUSH=10
meaning "push every 10 checkpoints?"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@casperdcl
DVC_EXP_AUTO_PUSH
is the remote git repo for auto push. Maybe this name needs to be modified. But it is better to discuss this problem in the PR for dvc.org
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR for
dvc.org
I don't follow. I understand what you're using DVC_EXP_AUTO_PUSH
for. I'm asking for:
DVC_EXP_AUTO_PUSH
to be changed to mean something else (push every N epochs)- or a new variable
DVC_EXP_AUTO_PUSH_EVERY
added (push every N epochs) - or 2 different variables with less ambiguous names
DVC_EXP_AUTOPUSH_CHECKPOINT=0
,DVC_EXP_AUTOPUSH_REMOTE=""
Basically I think a "push every N epochs" config should be supported in this PR rather than in a follow-up so that we can keep API consistent. WDYT @efiop @pmrowla?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@casperdcl
Is "push every N commits" actually a requirement for CML? This seems like an additional feature request that's unrelated to being able to recover checkpoints from a dead CI runner.
And it seems like there's still some behavior discussion that would need to be worked out there, i.e. what happens if I have it set to 10 and my experiment generates less than 10 commits, should DVC not push anything?
It also seems like something that should maybe be tied to dvclive and not DVC itself, since DVC doesn't even have a real concept of epoch count/iteration awareness (related dvclive ticket: iterative/dvclive#113). Since dvclive is aware of actual epoch counts, it can just call experiments.push
at the appropriate time without the need for adding more env vars
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DVC_EXP_GIT_REMOTE
is used for the default exp push git remote. Bothdvc exp push
andauto push
can use it (But the auto push cann't work without it, whiledvc exp push
can run git remote arguments).
But it is a totally new feature, involving changes on dvc exp push
should not be mixed in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Configuring multiple git remotes is a pretty typical use case
ok in that case I'd go with
- Split into two env vars
i.e. DVC_EXP_CHECKPOINT_PUSH:bool
and DVC_EXP_GIT_REMOTE:str
?
Regarding:
I think that the behavior of
auto push
should go withdvc exp push
This also makes sense to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also not specific to checkpoints, regular experiments will end be pushed the same way if the env var is set.
I think we need to keep CHECKPOINT
out of the names of these env vars for this reason.
Splitting into two env vars makes sense, but it does make the UX more complex. I think it's worth making sure this is really necessary before creating extra complexity for hypothetical future needs, so I have a couple follow-ups:
Why not rename the env var to DVC_EXP_GIT_REMOTE
and make no other changes? That provides future flexibility but also keeps the UX concise.
Alternatively, I think we could reconsider having a default remote.
Configuring multiple git remotes is a pretty typical use case (original/
upstream
repo vs my fork/origin
). If I'm working on a fork and want my run experiments pushed to the original repo but not my fork, I need to be able to configure the remote.
- Is this likely in CI?
- Even locally, are users likely to do
git push -u upstream mybranch
? My typical workflow would be to pull from upstream and push to origin by default. - If we aren't following Git default remote configuration, why not document that
origin
is the default instead of requiring it to be explicit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this likely in CI?
in CI probably not
Even locally, are users likely to do git push -u upstream mybranch? My typical workflow would be to pull from upstream and push to origin by default.
I do this regularly with DVC - we have CI configured to run a different set of actions on pull requests vs branches pushed directly to iterative/dvc
, so there are times when I want to push a certain branch for specific features (that will require the stricter branch tests) to upstream rather than my personal fork
If we aren't following Git default remote configuration, why not document that origin is the default instead of requiring it to be explicit?
This is an option, although as someone who is going to have to answer support questions regarding "why is the default for this experiment origin
instead of the default I set on my git branch", my preference would be to continue requiring the explicit argument
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, I think we could reconsider having a default remote.
I think having a default remote for the dvc exp push
is reasonable, both dvc push
and git push
have default remotes.
How about adding it to the dvc.config
, it is more manageable.
And if a default remote for dvc exp push
is setted, the only env arguments here to control the auto push process is DVC_EXP_CHECKPOINT_PUSH:bool
.
I don't think users will directly use auto push without using dvc exp push
previously. If they are familiar with dvc exp push
, and knows how to set up a default remote. The auto push configuration is quite a simple one.
Co-authored-by: Peter Rowlands (λ³κΈ°νΈ) <[email protected]>
1. move env name to dvc/env.py. 2. add some tests for it.
1. change the behaviour of self remote 2. do not use string DVC_EXP_AUTO_PUSH 3. downgrade the logger level in auto push 4. use full branch ref
Co-authored-by: Peter Rowlands (λ³κΈ°νΈ) <[email protected]>
f"try to auto checkpoints to {git_remote} which is the " | ||
"running repository dvc cache will be pushed to the " | ||
"default remote while git references will not be pushed" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand this message. Maybe something like
Try to save the experiment status to {git_remote} after every checkpoint.
It's data will be uploaded to the default remote (if any).
Not sure what Git refecences we're talking about. Meaning the experiments themselves? If so what gets saved to {git_remote} ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are git push
ing the currently running experiment ref. In this condition, git_remote
points to the current git repo itself, so nothing actually happens on the git side. However, when this env var is set, we also automate dvc push
ing used cache objects and run cache for the currently running experiment as well.
So the side effect here is that nothing will be "pushed" anywhere on the git side of things, but DVC outputs and run cache will still be pushed to the default configured DVC remote.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The warning message should probably just be something like
'{git_remote}' points to the current Git repo, experiment Git refs will not be pushed. DVC cache and run cache will automatically be pushed to the default DVC remote (if any) on each experiment commit.
In summary,
had I missed something? |
@iterative/cml |
Co-authored-by: Peter Rowlands (λ³κΈ°νΈ) <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good for the most part, I think this should be the last set of changes to do
Co-authored-by: Peter Rowlands (λ³κΈ°νΈ) <[email protected]>
Co-authored-by: Peter Rowlands (λ³κΈ°νΈ) <[email protected]>
for more information, see https://pre-commit.ci
@classmethod | ||
def _validate_remotes(cls, dvc: "Repo", git_remote: Optional[str]): | ||
|
||
if git_remote == dvc.root_dir: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason to have this explicit check? Would this qualify as a valid git remote? Is there some reason to think a user might set DVC_EXP_GIT_REMOTE
to their local project's root dir?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, why fail for an invalid Git remote but only warn for this scenario?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
git_remote
is just a variable that is either a path/URL to a git repo (including local ones) or a remote name. If this check is hit it means it's a local path that points directly to the current DVC root directory, which in most cases is a valid git repo path.
This behavior can be useful - if you do DVC_EXP_GIT_REMOTE=.
the end result would be that we auto-push DVC cache for your local experiment runs. (The git push step becomes a no-op, the same thing that happens if you do a CLI git push
to your own current repo directory)
|
||
if git_remote == dvc.root_dir: | ||
logger.warning( | ||
f"'{git_remote}' points to the current Git repo, experiment " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
f"'{git_remote}' points to the current Git repo, experiment "
I'm not sure this is helpful since a valid remote would also point to the current Git repo, right? Maybe something like "local workspace" makes more sense than "current Git repo"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The local workspace is your current Git repo. https://github.com/my/repo.git
is not your current Git repo - it's a remote Git repo, and potentially where your current Git repo is cloned from, but "current repo" would always be your current local workspace repo
thanks @karajan1001 - is there a follow-up PR in https://github.com/iterative/dvc.org/pulls? |
@casperdcl related iterative/dvc.org#2672 |
* 'python310' of github.com:skshetry/dvc: Unpin networkx Remove a special queued experiments (iterative#6393) build(deps): bump google-cloud-storage from 1.41.1 to 1.42.0 (iterative#6415) Push running checkpoint to remote (iterative#6332)
Fix #6182
β I have followed the Contributing to DVC checklist.
π If this PR requires documentation updates, I have created a separate PR (or issue, at least) in dvc.org and linked it here.
Add checkpoint backupΒ dvc.org#2672
Thank you for the contribution - we'll try to review it as soon as possible. π