-
Notifications
You must be signed in to change notification settings - Fork 180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wrong file name for download from a FigShare URL #760
Comments
What filename does curl give? |
Apologies if this is dumb, but I'm not sure how to answer that! 😅 |
Don't worry, I got this. ❱ curl -Li "https://ndownloader.figshare.com/files/6294558"
HTTP/1.1 302 Found
Server: nginx
Date: Sat, 18 Sep 2021 10:28:44 GMT
Content-Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
Content-Length: 0
Connection: keep-alive
...
Location: https://s3-eu-west-1.amazonaws.com/pstorage-rs-4828782598/6294558/rsta20150293_si_001.xlsx?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-
...
Content-Disposition: attachment;filename=rsta20150293_si_001.xlsx
... That content-disposition is what we should (and normally do) use to determine file name. I wonder if we are getting tricked by the |
Yeah looks like it is the redirect confusing things julia> resp = HTTP.headers(HTTP.request("GET", "https://ndownloader.figshare.com/files/6294558"))
12-element Vector{Pair{SubString{String}, SubString{String}}}:
"x-amz-id-2" => "t5e96lUgqwlnTS65M5hrdcLtnZ/K3vhlDScYBehbxxFL85CqPMfrqsc8nMbXy4KG1FL8nB/3NCw="
"x-amz-request-id" => "PQM7QC3VMVZPTGFZ"
"Date" => "Sat, 18 Sep 2021 10:45:02 GMT"
"x-amz-replication-status" => "COMPLETED"
"Last-Modified" => "Fri, 03 Sep 2021 08:47:47 GMT"
"ETag" => "\"bf518a09be3cf14d4d7abb47489cbae8\""
"x-amz-tagging-count" => "1"
"x-amz-version-id" => "U6trxKUd0lhNhFEHAmnrHQsVVofp9yxk"
"Accept-Ranges" => "bytes"
"Content-Type" => "binary/octet-stream"
"Server" => "AmazonS3"
"Content-Length" => "463645" |
fredrikekre
added a commit
that referenced
this issue
Sep 26, 2021
in HTTP.download, fixes #760. Co-authored-by: Lyndon White <[email protected]> Co-authored-by: Fredrik Ekre <[email protected]>
fredrikekre
added a commit
that referenced
this issue
Sep 26, 2021
Use Content-Disposition for 3xx requests for filename detection in HTTP.download, fixes #760. Co-authored-by: Lyndon White <[email protected]> Co-authored-by: Fredrik Ekre <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
This minimal example,
downloads a table from FigShare (doi: https://doi.org/10.6084/m9.figshare.3980064.v1) and tries to read it (using DataDeps.jl) but fails with
No such file or directory
. It fails because instead of the correct name ("rsta20150293_si_001.xlsx"
), the file is saved as6294558
(which is the last "word" of the URL). However, this exact same snippet used to work sometime in the past year, so something changed since then. Talking briefly with @oxinabox, he suggested that this could be an issue for HTTP.jl or an issue from FigShare itself. I tried to dig when/where something changed using blame here, but I failed to figure it out, and pinning earlier package versions did not work either.I don't come with just a problem, FWIW, a solution/workaround (thanks to Lyndon as well) is to rename the file in post-processing after download. Thus at this stage this is not an issue for me anymore, but hopefully posting all these details will help someone here find a fix! 😃
The text was updated successfully, but these errors were encountered: