-
Notifications
You must be signed in to change notification settings - Fork 493
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
replace download option "RData" with RDS #6678
Comments
I think an rds is preferable to an RData file, for the reasons that @kuriwaki and I have discussed in the r client repo. Please let us know if you'd like to discuss it more. |
@wibeasley thanks for jumping in. @reikoch thanks for opening this issue. I'm not a very good R developer and I'm ignorant about these formats but my first thought is... are you sure you want to replace the ability to download RData format with RDS format? I'm concerned about scripts that may rely on the older format (I assume it's older) for reproducibility. I would think adding RDS support would be safer, more backward compatible. So we'd offer both formats, I'm saying. |
Backward compatibility is probably necessary. By the way, for this particular dataverse file, I think downloading it as the original .csv file and reading it in as a csv file is preferable to transforming it in to RData/Rds. |
Well generally I think it is bad to use a mechanism as RData format where when loading you cannot determine the target's name. True, R can read pretty much any file format but rds and RData are type safe (dates are noted as such etc), csv is not. In addition with plain csv there is no encoding of the data specified, http://frictionlessdata.io/ might be a way out as data packages store these metadata, xlsx does so too. As a consumer I love type safe data formats in specified encoding! |
|
Ok, that means the RData file is derived from the csv file making some assumptions on encoding. Looking at the variable VSORRESU in CSC305ABC_VS it seems that the csv file was encoded in Latin1 which the derivation did not pick up - see unit for temperature measurements. Maybe just provide original file and a quick analysis of encoding and csv dialect for data uploaded as csv? |
To focus on the most important features and bugs, we are closing issues created before 2020 (version 5.0) that are not new feature requests with the label 'Type: Feature'. If you created this issue and you feel the team should revisit this decision, please reopen the issue and leave a comment. |
I feel the benefits of an rds file (over a rdata file) are as relevant as ever. However I'd rather have a parquet file than an rds file (see #9897) because it has the benefits of an rds file (plus is language-agnostic). @kuriwaki, you're more in touch with how people use R w/ Dataverse. Is there a community that would substantially benefit from both rds & parquet files? Or would the parquet files satisfy their needs adequately? |
Reopening as per @wibeasley's request. |
@cmbz, I'm not sure it needs to be reopened. If @kuriwaki and others agree that all the benefits of an rds file are provided by a parquet file (for the R + Dataverse users), I think we'd rather the rds effort be conserved and redirected towards a parquet option (which would benefit other languages too, like Python). Like rds, parquet files are compressed and strongly-typed. There also should be packages to handle the hard work, so Dataverse software doesn't need to get entangled with the problems with RData files and the messiness of Rserve described by @landreev. |
Got it, thanks @wibeasley. Please chime in here @kuriwaki and let me know if you're okay with closing again. |
My views now are closer to #7249 where I suggest getting rid of either rdata/rds exports of ingested files altogether. I think having the rdata export format (again, for ingested files, which I think is the question here) is not necessary for R users, and might only confuse beginners of R. R users should read ingested files as plain-text files, not as a custom R format. |
edit: added a missing negation, typo |
In https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/QPHMKX each of the 4 data files provides the option to download the individual file in RData format.
This is nice for R users but in this example the downloaded data get inserted as an object called "x" in R; loading several objects will repeatedly overwrite object x.
As an alternative to RData format I would suggest to use R's RDS format https://stat.ethz.ch/R-manual/R-devel/library/base/html/readRDS.html which
The text was updated successfully, but these errors were encountered: