-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
data files disappearing from file tree after editing .dvc file(s) by hand and attempting dvc repro #1800
Comments
Hi @brbarkley ! Looks like you simply deleted it from your workspace. What is the dvc file where Thanks, |
Hi @efiop Yes, I had attempted
I surmised this meant that it would replace the existing file with the one in cache, but the warning language is not clear so I aborted because I did not want to lose the file. Perhaps DVC should revise their warning language for "file is going to be removed" communicates something different to me than "local file is going to be updated/replaced with cache file". After working up the courage to proceed with Thanks for your help! |
UPDATE: Issue:
Step 2 Step 3
Note: File folders Step 4 Solution
All is updated correctly. I then Suggestion:
I can easily restore the |
Hi @brbarkley ! Thanks for the investigation and all the suggestions! We will revisit those error messages to be more informative. 🙂
I don't think dvc should restore those automatically, because checking out files for the command that clearly failed in your script is even more dangerous and might break your pipeline. It is better to have it error-out like that, so at least you are aware of your previous command failing and not creating needed dependencies for a new stage. Btw, could you talk a little bit more about your |
Thanks @efiop! [replying from mobile] My dvc_pipeline.sh file is kind of a wrapper. As I integrated DVC into my workflow, I found it necessary to keep a record of my dvc run commands for each stage file created. The bash script accommodates this need by storing the commands, gives me an easy way to edit them in the case a stage needs to be modified, and depending on bash parameters specified I can re-run specific stages. The name of the file is a bit misleading I suppose since it mostly contains dvc run commands. However, I do have an option at the end of the file which calls dvc pipeline to create and export my DAG to svg. |
@brbarkley
And why don't you just use Thanks, |
[reply by mobile] @efiop I do use dvc repro but in some cases I find it more efficient to redefine a stage by editing the original dvc run command and re-executing dvc run. I had in fact attempted to edit one of my dvc files by hand (because I had changed the file paths of some of my dependencies and output) and subsequently call dvc repro to update the pipeline. However, dvc repro would not run successfully—which is the reason I opened this issue. So it’s not clear to me how editing dvc files by hand and running dvc repro is a robust solution (note, dvc move did not work for moving/editing the file paths in question because it said the dvc stage was associated with an image/chart output as opposed to a data file...see string above). In addition, the DVC documentation and usage guide does not clearly show how to go about editing dvc files by hand whereas dvc run has better documentation and seems to be a more programmatic way to update and version control my pipeline. Also, as DVC is still in development with command syntax still in flux (e.g., see recent changes to dvc run which I’m not complaining about...I liked the changes), I find it much easier to keep my project up-to-date if I have record of the dvc run commands that can simply be edited instead of completely rewritten. So, generally speaking, I use dvc repro if I have edited the contents of one of my code dependencies. But if I want/need to move/rename files or add/delete dependencies or outputs, I edit my dvc run commands and re-execute dvc run. If there’s a better or more approved way of doing things, I’m open to suggestions. Thanks, |
Hi @brbarkley ! Thanks a lot for explaining! That feature for Thanks, |
Hi @efiop Thanks for the insight! Yes, I think #1489 would improve As an additional note--one that should have been mentioned in Step 3 of iterative/dvc#1800 (comment)--another factor contributing to my problems with This may seem obvious, but if file folders are listed as outputs in a particular stage, users need to ensure that the command for that stage includes a means of checking for the folder's existence and creating it if it doesn't exist. Otherwise Closing this issue. Brett |
@brbarkley That is a really good point! Created iterative/dvc.org#233 . Well take a look if you could clarify it a bit. Thank you for all the feedback! 🙂 |
Please provide information about your setup
DVC version(i.e.
dvc --version
)Platform and method of installation (pip, homebrew, pkg Mac, exe (Windows), DEB(Linux), RPM(Linux))
Issue:
Data files tracked by DVC are disappearing from my local file tree before I have pushed them to my remote. I'm not sure how it happened and I did not want them to disappear.
However, DVC still detects the files when running
dvc status -c
But notice the output/figures/pca_plots and output/figures/kmeans_plots folders and their files are not in my file tree:
Is there a way to get the files back into my local file tree? Would
dvc push
followed bydvc pull
do the trick since the "missing" files are still detected bydvc status -c
?Also, is it possible to pinpoint how/when the files disappeared?
Thanks,
@brbarkley
The text was updated successfully, but these errors were encountered: