-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
404 errors in vignette - get_file() #33
Comments
Whilst I was looking around @wibeasley I noticed that line 104 in get_file.R never gets called because The relevant code is copied below. if (length(query)) {
r <- httr::GET(u, httr::add_headers("X-Dataverse-key" = key), query = query, ...)
} else {
r <- httr::GET(u, httr::add_headers("X-Dataverse-key" = key), ...)
} |
Never! Are you coming to the 2020 Dataverse Community Meeting? 😄
My money is on a change in Dataverse, not curl. 😄 I'll try to dig in more on this during the work week. Have a good weekend! |
Is this related to the fact that passing "format=original" only works for tabular files? Please see IQSS/dataverse#6408 @wibeasley to be honest, I'm a little lost in this issue, probably because I'm not much of an R hacker. Please keep the questions coming. Please let me know how I can help. 😄 |
Appreciate if there is a fix/workaround for this. I currently cannot read non-ingested datasets as well as ingested Stata datasets that originate from Stata v14+ files. Here are three examples in the CCES, where the first one works but not the other two. library(dataverse)
# hide my key
# tab files in CCES 2017 (Stata v12 dataset) WORKS
# https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/3STEZY
cc17 <- get_file("Common Content Data.tab", "doi:10.7910/DVN/3STEZY")
writeBin(cc17, "Common Content Data.dta")
cc17_dta <- foreign::read.dta("Common Content Data.dta")
cc17_dta <- haven::read_dta("Common Content Data.dta")
# tab files in CCES 2018 (Stata v14+dataset) DOES NOT WORK
# possibly because of Stata version issue
# https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/ZSBZ7K
cc18 <- get_file("cces18_common_vv.tab", "doi:10.7910/DVN/ZSBZ7K")
writeBin(cc18, "cces18_common_vv.dta")
cc18_dta <- foreign::read.dta("cces18_common_vv.dta")
#> Error in foreign::read.dta("cces18_common_vv.dta"): not a Stata version 5-12 .dta file
cc18_dta <- haven::read_dta("cces18_common_vv.dta")
#> Error in df_parse_dta_file(spec, encoding, cols_skip, n_max, skip, name_repair = .name_repair):
#> Failed to parse /private/var/folders/gy/sd6ddp895s7dyqbdh2432fwm0000gn/T/
#> RtmpoYWXRS/reprex14f97d9c5c8b/cces18_common_vv.dta:
#> This version of the file format is not supported.
# Cumualtive common content dta, not tabulated, DOES NOT WORK
# https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/II2DB6
ccc_d <- get_file("cumulative_2006_2018.dta", "doi:10.7910/DVN/II2DB6")
#> Error in get_file("cumulative_2006_2018.dta", "doi:10.7910/DVN/II2DB6"):
#> Not Found (HTTP 404).
ccc_r <- get_file("cumulative_2006_2018.Rds", "doi:10.7910/DVN/II2DB6")
#> Error in get_file("cumulative_2006_2018.Rds", "doi:10.7910/DVN/II2DB6"):
#> Not Found (HTTP 404). Created on 2019-12-09 by the reprex package (v0.3.0) |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Thanks. It's good to know the data is there. As was originally pointed out, when I remove the Since |
@wibeasley: the changes I've made in the fork seem to fix for example here are results from @EdJeeOnGitHub 's #31 library(dataverse) # devtools::install_github("kuriwaki/dataverse-client-r")
dv_files <- get_dataset("doi:10.7910/DVN/JGLOZF")$files
# for each file in data
for (f in 1:nrow(dv_files)) {
data_bytes <- as.integer(object.size(dataverse::get_file(dv_files$id[f])))
data_mb <- round(measurements::conv_unit(data_bytes, "byte", "MB"), 3)
metadata_mb <- round(measurements::conv_unit(dv_files$filesize[f], "byte", "MB"), 3)
print(glue::glue("{dv_files$filename[f]}, {metadata_mb} MB in metadata, {data_mb} MB when downloaded"))
}
#> finalusingindices_anon.tab, 11.251 MB in metadata, 11.263 MB when downloaded
#> ReadMe with Codebook.docx, 0.036 MB in metadata, 0.036 MB when downloaded
#> The Hunger Project Dataverse Files.zip, 1.98 MB in metadata, 1.98 MB when downloaded
#> THPawareness_HH_anon.tab, 0.585 MB in metadata, 0.586 MB when downloaded Created on 2019-12-16 by the reprex package (v0.3.0) |
@kuriwaki the examples I posted initially are now working. Thanks so much for figuring it out and fixing. I made one small addition in the commit referenced above. Basically, it catches the case if the file is already specified as a number/id. Was that your intention, or am I misunderstanding something? |
also correct `create_dataverse()` ref #33
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
…bular data in their original form. Otherwise, do not specify `format`
You guys might start regretting inviting me to be a maintainer. I'm having trouble reproducing the vignettes, even easy parts like retrieving plain-text R & CSVs.
Part 1: out of the box
Created on 2019-12-06 by the reprex package (v0.3.0)
Part 2: digging.
Using
debug(dataverse::get_file)
, the error-throwing line is inget_file()
:To make things a tad more direct, I called
dataverse::get_file(2692233)
. The two relevant parameters tohttr::GET()
areThe
r
value returned isThat
u
value is fine when pasted into Chrome. I saw several Dataverse discussions about a trailing/
. When I added that, the response appears good.Part 3: Questions
I assume this is error is fairly new. Some change with Dataverse? If not, maybe it's related to change with curl that was released 4 days ago?
Why are csv & R files affected, but not tab files? As I step through a tab file (e.g.,
dataverse::get_file(2692294)
), it appears the exact same lines are executed. And thatu
value doesn't have a trailing slash (https://dataverse.harvard.edu/api/access/datafile/2692294
). I see two differences: (a) the content type and (b) this one doesn't go through AWS/S3.This is probably related to @EdJeeOnGitHub's recent issue More verbose httr error message for get_file() #31. Notice he mentions problems with certain file formats.
Is this related at all to Dataverse URL - Page Not Found (404 Error) w/ Trailing Forward Slash dataverse#3130, Dataverse URL - Validation Error for Bad URL with "." dataverse#2559, or Error message for wrong/missing dataverse is not clear dataverse#4196?
You can see that my knowledge with the web side of this is limited; I don't understand them that well.
devtools::session_info()
Session info ---------------------------------------------------------------------------
setting value
version R version 3.6.1 Patched (2019-08-12 r76979)
os Windows 10 x64
system x86_64, mingw32
ui RStudio
language (EN)
collate English_United States.1252
ctype English_United States.1252
tz America/Chicago
date 2019-12-06
Packages -------------------------------------------------------------------------------
package * version date lib source
assertthat 0.2.1 2019-03-21 [1] CRAN (R 3.6.0)
backports 1.1.5 2019-10-02 [1] CRAN (R 3.6.1)
callr 3.3.2 2019-09-22 [1] CRAN (R 3.6.1)
cli 1.1.0 2019-03-19 [1] CRAN (R 3.6.0)
clipr 0.7.0 2019-07-23 [1] CRAN (R 3.6.1)
crayon 1.3.4 2017-09-16 [1] CRAN (R 3.6.0)
curl 4.3 2019-12-02 [1] CRAN (R 3.6.1)
dataverse * 0.2.1 2019-12-07 [1] Github (bac89f4)
desc 1.2.0 2018-05-01 [1] CRAN (R 3.6.0)
devtools 2.2.1 2019-09-24 [1] CRAN (R 3.6.1)
digest 0.6.23 2019-11-23 [1] CRAN (R 3.6.1)
ellipsis 0.3.0 2019-09-20 [1] CRAN (R 3.6.1)
evaluate 0.14 2019-05-28 [1] CRAN (R 3.6.0)
fs 1.3.1 2019-05-06 [1] CRAN (R 3.6.0)
glue 1.3.1 2019-03-12 [1] CRAN (R 3.6.0)
htmltools 0.4.0 2019-10-04 [1] CRAN (R 3.6.1)
httr 1.4.1 2019-08-05 [1] CRAN (R 3.6.1)
jsonlite 1.6 2018-12-07 [1] CRAN (R 3.6.0)
knitr 1.26 2019-11-12 [1] CRAN (R 3.6.1)
magrittr 1.5 2014-11-22 [1] CRAN (R 3.6.0)
memoise 1.1.0 2017-04-21 [1] CRAN (R 3.6.0)
packrat 0.5.0 2018-11-14 [1] CRAN (R 3.6.0)
pkgbuild 1.0.6 2019-10-09 [1] CRAN (R 3.6.1)
pkgload 1.0.2 2018-10-29 [1] CRAN (R 3.6.0)
prettyunits 1.0.2 2015-07-13 [1] CRAN (R 3.6.0)
processx 3.4.1 2019-07-18 [1] CRAN (R 3.6.1)
ps 1.3.0 2018-12-21 [1] CRAN (R 3.6.0)
R6 2.4.1 2019-11-12 [1] CRAN (R 3.6.1)
Rcpp 1.0.3 2019-11-08 [1] CRAN (R 3.6.1)
remotes 2.1.0 2019-06-24 [1] CRAN (R 3.6.0)
reprex 0.3.0 2019-05-16 [1] CRAN (R 3.6.0)
rlang 0.4.2 2019-11-23 [1] CRAN (R 3.6.1)
rmarkdown 1.18 2019-11-27 [1] CRAN (R 3.6.1)
rprojroot 1.3-2 2018-01-03 [1] CRAN (R 3.6.0)
rstudioapi 0.10 2019-03-19 [1] CRAN (R 3.6.0)
sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 3.6.0)
testthat 2.3.1 2019-12-01 [1] CRAN (R 3.6.1)
usethis 1.5.1 2019-07-04 [1] CRAN (R 3.6.1)
whisker 0.4 2019-08-28 [1] CRAN (R 3.6.1)
withr 2.1.2 2018-03-15 [1] CRAN (R 3.6.0)
xfun 0.11 2019-11-12 [1] CRAN (R 3.6.1)
xml2 1.2.2 2019-08-09 [1] CRAN (R 3.6.1)
screenshot of postman
The text was updated successfully, but these errors were encountered: