-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
push: Unnotified error when pushing data into HTTP remote #7564
Comments
cc @dtrifiro could you take a look? |
Hello! We have finally got the necessary permissions and we have put a version of the HTTP Remote here https://github.com/atekoa/dvc-http-remote |
Thanks! I will look into it asap |
Hi @atekoa, I tried following your instructions but it seems I cannot get the remote (simple case) to work:
As a note, it seems it's trying to write to |
Also, note that DVC |
I have re-launched everything from the beginning and it has worked correctly for me. I have corrected some paths in the Readme that were wrong, but that would not be the problem. |
sequenceDiagram
participant Terminal
participant http_remote
participant StorageSite
Terminal->>Terminal: dvc add/git add/git commit
Terminal->>http_remote: dvc push (http://localhost:8080/remote?remote=0/87/8750719fec346635c5beb4c2132a46)
http_remote->>StorageSite: (local or azure) io.Copy
The http url contains the remote, so dvc appends the folder/file to the URL cat .dvc/config
[core]
remote = localhost
['remote "localhost"']
url = http://localhost:8080/remote?remote=0
ssl_verify = false The URL that we recieve is parsed with gorilla.mux
So, in this case with DVC2, the URL will be parsed as |
The folder is created when you launch the docker-compose, this should not be the problem https://github.com/atekoa/dvc-http-remote/blob/main/main.go#L20 |
Hello @dtrifiro , |
Hi @atekoa, |
I could not reproduce the issue. Closing this as it's likely related to the custom remote being used. |
Possibly related: #8100 |
Hey @atekoa, would you mind trying the fix suggested in the above issue to see if it solves your issue? |
I have tried the fix over a multi-folder dataset against our http remote and we do not get the error now. We will check this during next week and will provide additional info. |
It works! |
Bug Report
Issue name
push: Unnotified error when pushing data into HTTP remote
Description
This issue happens when pushing a bulk of files into a HTTP dvc remote. dvc push reports that everything is correct. However, when downloading the files, some of them have not been uploaded correactly and thus, they do not exist on the remote.
Reproduce
More detailed:
// Download random dataset
// Try with HTTP remote:
// Add and push the data
// Download the data and check
// Try to push again the data
Expected
All the data in the remote, of course ;)
Environment information
Output of
dvc doctor
:Additional Information (if any):
We had the "Session is Closed" problem prevously:
pull: Using jobs>1 fails with RuntimeError: Session is closed in http remote #7421
Solved with:
fs.http: prevent hangs under some network conditions #7460
And we have proposed this:
dvc push doesn't recognise that files are missing in remote storage #4164
Force push option #7268
push: add --force option to force push without .dir optimization #7532
but the problem is more serious because you don't really know that it had failed (we would have to ask the users to try it at least twice to ensure that the data has been uploaded correctly...)
Additionally, when you try to push the files again, the .dir optimization precludes to upload again the files and dvc thinks that everything is uploaded. If the dataset have subfolders, the problem is even worse, as re-adding the files do not correct the issue due to .dir optimization.
The text was updated successfully, but these errors were encountered: