-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dvc push doesn't recognise that files are missing in remote storage #4164
Comments
@dldx Could you try removing Did someone run garbage collection on your remote by any chance? |
@efiop Hmm, nope, that didn't change anything... It is easy for me to replicate this. I just need to delete a file in the remote storage after I have pushed it. I didn't run gc but someone else may have (I have warned everyone about running gc on the remote). I'm not really sure yet what caused it but I'm wondering if there are other missing files as well. |
@dldx That messes up with our index, hence why I've asked to remove |
Sorry for the confusion. I did remove the index, but it didn't trigger a check. $ rm -rf ../.dvc/tmp/index
$ dvc status -c PSScene4Band.dvc
Data and pipelines are up to date.
$ dvc push -R PSScene4Band
Everything is up to date. No push triggered even though there are remote files missing. |
@dldx Are you sure you've deleted a file that is used in PSScene4Band.dvc ? |
@dldx Ah, sorry, I've missed that we actually trust the remotes in regards to the files in directories in order to make dvc operations faster. If your collegues are deleting stuff randomly from the cloud, you might consider making your remote untrusted with:
that will make it paranoid again. |
@dldx But usually even during gc, we delete the |
Thinking about it, we could consider throwing a warning if we assume that file is there but not able to pull it. |
@dldx Any updates? 🙂 |
Sorry to reopen this, but I have a nearly identical problem. Though, in my case it is not caused by deleting files in the remote, but by adding a new remote to which I want to push. I have tried both suggestions, removing tmp/index and setting verify to true, both is not solving the problem I only receive a "Everything is up to date", which is incorrect since the files do not appear in the remote storage. |
@raharth Could you elaborate on how you are detecting that they do not appear? Also, please show |
@efiop Thanks for your fast reply! The remote is an Azure container to which I have access, hence I can see that there are no files appearing.
|
@raharth Are any dvc files gitignored? Could you show |
@vladimircape Could you show the contents of From our side, it would be handy to add an option to ignore the index, of course. |
@vladimircape Could you try deleting |
Hello! First of all I wanted to thank you for how well you always answer, it's really nice and trustworthy. We have had the same problem, a dvc push of a folder with 41 files, which ended correctly (apparently, because there were no error messages) and yet only 40 files were uploaded. Doing dvc pull on another server got us the error. Now there has been no way to force the push where the files are because it only checks that the folder.dvc (f9c8def4b2a1a6b783209d933e26a6.dir) exists on the remote and not the files that are inside the folder. Is there a way to do dvc push --recursive or maybe dvc push --force it to try to upload the files again? By the way, we deleted the file f9c8def4b2a1a6b783209d933e26a6.dir from the remote and this time the missing files were uploaded, but now we have the doubt if it has happened to us in other projects. |
More info:
and, of course, dvc doctor:
|
@atekoa this is not currently planned, but as @efiop noted previously it would be good for us to have this flag to allow force pushing directories. I've created a separate issue that you can follow to keep track for further updates on this |
Ran into the same issue atekoa describes well.
Resorted to doing this as well for our s3 remote and then pushing the files again worked. Something like a force push with a |
For some reason, I was missing a number of files in my remote storage on GCS. When I run
dvc pull
, it fails with the following error:In order to fix this, I decide to run dvc add and dvc push on a machine which still has these files. The commands run fine, and dvc push reports that everything is fine. However, in reality, it does not upload the missing files to the remote cache. In the end, I had to solve this by manually uploading these files to my remote storage in GCS.
Bug Report
Please provide information about your setup
Output of
dvc version
:dvc_push_log.txt
The text was updated successfully, but these errors were encountered: