-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Force push option #7268
Comments
@jpaasen Do you know how the corruption happened? We verify local files before uploading them, so it is quite unusual for files to get corrupted on remote. Alternative/additional approach here would be to use existing |
Answered in discord:
|
Thank you @daavoo |
I recently ran into a similar problem. Is there any update on this? |
No updates yet. @iterative/dvc Any estimate on the level of effort? |
I also ran into this issue. Any update on this? |
Related discord request: https://discord.com/channels/485586884165107732/563406153334128681/1141858693806293132 |
Similar problem (on google cloud remote). Somehow not all files are sent to remote, and pulling to another machine doesn't work properly. |
I'd like to contribute to this issue. I've spent some time searching for potential solutions, and I'd love some feedback on whether I'm heading in the right direction. My understanding of the issue is that network problems can occasionally lead to file corruption when performing dvc push. However, since the filename has the same MD5 value as the local file, rather than the actual MD5 value of the uploaded file, DVC is unable to detect the corruption. As a result, DVC performs operations on the corrupted file as it would on an uncorrupted file. For instance, it doesn't replace the file during subsequent dvc push commands. My high-level solution is to
|
Discussed this issue today. Takeaways:
|
I recently experienced corrupted data when transferring large files to google cloud storage (gcs).
See discussion on Discord here.
In short, the md5 of the files at the remote was different than the md5 filename given to it by DVC. And since the md5 values at the remote were the correct ones, it was not possible to push the data one more time to get it right.
Right now there are now commands in DVC that can resolve an issue like this without losing data history.
You could do:
But this will delete all history for all files.
To solve this issue, our team had to backtrace the md5 values of the corrupted files and delete them manually from the gcs.
A "simple" solution would be to have a force option on
dvc push
(-f
). That copies the data even if the md5 sha values are equal.The text was updated successfully, but these errors were encountered: