-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add the option for imports to be backed up to the remote #4581
Conversation
@charlesbaynham thanks for the PR!! π @efiop @charlesbaynham two questions to clarify:
|
Hi! I also have some questions (left in the issue: #4527 (comment)) β mostly arguing against having an explicit |
@jorgeorpinel from your comments in the linked issue:
|
In a way yes, we are still having to go to In terms of clutter in the
Again, |
You're welcome!
Yes, I'm not totally sold on the naming yet. We discussed Very open to suggestions for better names!
That's not something I've thought about at all I'm afraid. It might "just work" since the alterations to |
OK, but it's too low level IMO. Implementation should probably not define requirements π. The question is how should it work so it provides the desired utility, without complicating the other 95% of use cases (that may not need it)? I still think the
Sorry, I'm confused. I thought that "The backup entry in outs should be added for all outs" (from #4527 (comment))
It's not about not letting users do something hypothetical. It's just that applying this globally seems out of scope for #4527. It's also speculative to assume that such change would be welcome generally. We don't even know if there's a need for
So this has been tested with dvc.yaml files? Thanks |
I don't like |
Oh and finally, from iterative/dvc.org#1788 (review):
Could the error message be less cryptic and provide a hint on what's the cause? (backup: false) But see, this change can get complicated very quickly π |
Ah, I realise I've been unclear. When I wrote that issue, I was expecting to go through adding What I mean by "added to all outs" is that it's part of the schema, and will be understood by dvc if it's manually added to an |
Can do if you think it's worth it. I didn't write that error message, it's coming from I feel like if a power user wants to alter the dvc files under-the-hood then they shouldn't expect finely crafted error messages. But if that's something you'd usually support, I'm sure we can add some code to |
OK, name poll! @efiop , @shcheklein , @jorgeorpinel could you please vote with emojis for the name of the flag to be added to
|
Good point. The point is, that we do have tickets/requests for this. Something like: "I want to push everything except my models ... or except data." Usually because of some regulations/security. Another potentially relevant issue- support remote per output (type). I.e. push models into one remote, data into another etc. Before we introduce any flags into DVC files (that we'll have to support, document etc), I would def spend a few hours trying to see if we can generalize this. |
@shcheklein Do you have any examples of these requests? I had a browse through the issues but I couldn't find any. I did find this one (#4040) which is the separate-remotes-by-data-type issue. |
@charlesbaynham sure, some of them that I was able to find within a few minutes: #2095 A lot of requests come from Discord or "use-case" discussion, os it's not that easy to find them usually. |
Thanks for the clarifications @charlesbaynham !
It won't be that obscure after we document it π (merging iterative/dvc.org/pull/1788)
Got it. I think it would be worth handling the specific case (if we do implement backup mode for all outs), if it's not too hard (implementation can dictate this req. π)
True @shcheklein! But like you mentioned, most such requests are for configuring default remotes per output (#4519 is another example). If that supports I still incline to only affect imports, but I no longer have a very strong opinion. Again, up to you guys. |
@jorgeorpinel You wrote in #2095 (comment):
I completely agree about the doubling options in and in the associated CLI. That's exactly what I had to do here, although I didn't add For the |
(Ivan wrote that, not me π) My take on this is the scope keeps growing and getting complicated. Probably just implement backup mode for imports as a first step, then decide about the rest in #2095? |
So he did, sorry @shcheklein ! The true / false setting for "backup mode" in the dvc file (name tbc) could later be extended to also accept (a list of) named remotes for #2095 later on. If that's the plan, we should choose the name accordingly now. |
yep, that's exactly my concerns/points! Expanding YAML schema is one of those things that we should be doing carefully- since we'll have to support it, it will complicate code potentially (e.g. Also, even if we don't cover different remote names here (it can be stretching it too far), we should at least consider covering a very close case- optionally disable push/save for regular outputs. |
I think the name |
BTW if/when |
That's how I originally wanted to do it, but this causes a problem for backwards compatibility. You need a way to migrate existing import |
9b1cd0d
to
e20f67c
Compare
I've renamed
@jorgeorpinel I've also added a hint to the error message. You now get e.g.:
|
d021cc4
to
2fe25b1
Compare
dvc/exceptions.py
Outdated
prepend = ( | ||
"The following files are marked as `store: false` so the " | ||
"remote was not searched, but are not present in local " | ||
"cache:\n{}".format("\n".join(failed_not_stored)) | ||
) | ||
m = prepend + "\n\n" + m |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prepend = ( | |
"The following files are marked as `store: false` so the " | |
"remote was not searched, but are not present in local " | |
"cache:\n{}".format("\n".join(failed_not_stored)) | |
) | |
m = prepend + "\n\n" + m | |
m = ( | |
"The following outputs are marked as `store: false` so " | |
"remote storage was disabled, and are not present in the " | |
"cache:\n{}\n\n".format("\n".join(failed_not_stored)) | |
) + m |
May need wrapping?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And how about also introducing a link to somewhere in the docs where this will be explained? (In anticipation of iterative/dvc.org/pull/1788)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From #4581 (comment):
ERROR: failed to pull data from the cloud - The following files... no_store.txt Checkout failed for following targets: no_store.txt
Those 2 new lines and repeated list items may be confusing though. Not sure how to improve the UI here and it's not a blocking problem but would be nice to try π
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a docs link like this:
The following outputs are marked as `store: false` so remote storage was disabled, and are not present in the cache:
foo
<https://error.dvc.org/checkout-failed-no-store>
Checkout failed for following targets:
foo
Is your cache up to date?
<https://error.dvc.org/missing-files>
and a section to the docs in iterative/dvc.org#1788.
dvc/output/base.py
Outdated
msg = f"""Incompatible options for output '{path}'. | ||
The dvc file specifies an output with both "cache == False" and | ||
"store == True" which is not possible to satisfy. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the indentation is off but not sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's right: I copied the other examples I saw. That'll produce:
Incompatible options for output 'path'.
The dvc file specifies an output with both "cache == False" and
"store == True" which is not possible to satisfy.
@@ -311,6 +325,9 @@ def dumpd(self): | |||
if self.persist: | |||
ret[self.PARAM_PERSIST] = self.persist | |||
|
|||
if self.store == self.stage.is_repo_import: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably
if self.store == self.stage.is_repo_import: | |
if self.stage.is_repo_import and self.store: |
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, actually that is right (although unclear): the purpose of this is to only add store: <something>
to the DVC file if it differs from the default. Default is store: true
for most outputs and store: false
for repo imports. So this code writes the value of store
to the DVC file only if
- it's a repo import and
store: true
- or it's a normal import and
store: false
@charlesbaynham Thanks a lot for your patience. Really sorry for this is taking so long. This PR looks great, just a few minor details to adjust. Let me see if we can take it from here... |
@charlesbaynham One more thing, could you please check the committer email in your commits and add it to https://github.com/settings/emails so that github is able to associate the great work you've done with your github account? |
12f4f88
to
95eeb25
Compare
Not at all, and sorry I haven't got around to this: it's on my list! Although I'm very happy for you to take it from here if you want to, of course. |
SCHEMA = output.SCHEMA.copy() | ||
del SCHEMA[BaseOutput.PARAM_CACHE] | ||
del SCHEMA[BaseOutput.PARAM_METRIC] | ||
del SCHEMA[BaseOutput.PARAM_STORE] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These lines are not coupling outputs with dependencies. Maybe some refactoring is needed. But this is not related to this PR.
95eeb25
to
5131af5
Compare
@charlesbaynham Thanks again for the PR and all the effort you've put into it! π Even though the functionality is pretty simple and nicely implemented by you, the main reason why this hasn't been merged yet is that We are currently starting to work on 2.0 changes and there are some plans to at least reconsider dvc file format as well as internal Stage class split, which might help set this |
Hey @efiop, thanks for letting me know! Do say if there's anything I can do to help |
It has been a while, but we now have more proper mechanisms to handle multiple remotes, as well as
and "backups" could mean just setting |
Closing as stale. Thanks @charlesbaynham. |
β I have followed the Contributing to DVC checklist.
π If this PR requires documentation updates, I have created a separate PR (or issue, at least) in dvc.org and linked it here: Document --backup import behaviourΒ dvc.org#1788
Closes #4527 - see this and Discord at https://discordapp.com/channels/485586884165107732/563406153334128681/751199400931491900 and https://discordapp.com/channels/485586884165107732/565699007037571084/751203534351106119 for some discussion.