Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write/read simple list-columns? #303

Closed
jennybc opened this issue Nov 2, 2015 · 12 comments
Closed

Write/read simple list-columns? #303

jennybc opened this issue Nov 2, 2015 · 12 comments
Labels
feature a feature request or enhancement

Comments

@jennybc
Copy link
Member

jennybc commented Nov 2, 2015

I realize this is probably a non-starter.

But if a dplyr workflow leaves you with a simple list-column and you want to write the data frame out to text file, you're stuck with dput(). By simple, I mean the elements are atomic vector. But a human-readable and GitHub/Excel-viewable tsv or csv is so much nicer.

The way this data frame is displayed by View() in RStudio is sort of tantalizing. Could readr ever support this sort of write/read?

library(dplyr)
n <- 3
set.seed(4)
df <- data_frame(x = seq_len(n)) %>% 
  mutate(y = replicate(n, sample(seq_len(n), 3), simplify = FALSE))
df
#> Source: local data frame [3 x 2]
#> 
#>       x        y
#>   (int)   (list)
#> 1     1 <int[3]>
#> 2     2 <int[3]>
#> 3     3 <int[3]>
df %>% View()
write_tsv(df, "foo.tsv") # :(
#> Error: Don't know how to handle vector of type list.
@jennybc
Copy link
Member Author

jennybc commented Nov 2, 2015

view-list-column

@hadley
Copy link
Member

hadley commented Nov 2, 2015

Hmmmmmmmmmmmm, it wouldn't be too hard - I'd just have to deparse() anything in a list column. I don't think we could make import automatic, but I could are a col_deparsed() or similar so you could manually specify.

@hadley hadley added feature a feature request or enhancement ready labels Jun 2, 2016
@hadley
Copy link
Member

hadley commented Jun 9, 2016

On consideration, I now think this is probably a bad idea - to go from the output of dput() to an R object, you have to evaluated the parsed code. Evaluating parts of a csv (even if you explicitly opt-in to it), seems like it's too surprising, and hence creates a security risk.

@jennybc
Copy link
Member Author

jennybc commented Dec 7, 2016

If this ever gets reconsidered, here's a related discussion from Bioconductor.

https://support.bioconductor.org/p/83911/

... which also has DataFrames which can hold non-standard vectors, e.g. a CharacterList column. Like readr, writing/reading these columns to/from a plain text delimited file is NOT currently possible. @lawremi describes a concrete work-around-flow re: "compound cells" based on comma-delimited strings and strsplit() + unstrsplit(). Mentions that data.table::fread() might support something like this.

@lawremi
Copy link

lawremi commented Dec 7, 2016

There is a lot of precedent for representing at least a single level of nesting by embedding CSV inside of a CSV column. One example is Solr: https://wiki.apache.org/solr/CSVResponseWriter.

@hadley
Copy link
Member

hadley commented Dec 7, 2016

I'll reopen just so we think about again in the future.

@hadley hadley reopened this Dec 7, 2016
@crazyhottommy
Copy link

I had the same problem today, my last column of the df is a list of characters (separated by commas). I want to write it to file and got this error:

Error in stream_delim_(df, path, ...) : 
  Don't know how to handle vector of type list.

Thanks.

@noamross
Copy link
Contributor

noamross commented Aug 4, 2018

Per discussion here, one could represent a list-col as a character vector of JSON strings.

@prosoitos
Copy link

I am very ignorant when it comes to JSON, so please excuse my naivety if this comment turns out to be silly.

One challenge I have had when trying to work with JSON are unsupported classes (see jeroen/jsonlite#62 for the general issue and rstudio/DT#537 and jrowen/rhandsontable#242 for the particular case of class difftime). There is the workaround of converting to character first and it might not be relevant to the conversation here (if so, apologies). But I thought that I would bring this up, in case this turns out to be something to take into account in the development of this approach.

@HedvigS
Copy link

HedvigS commented Nov 8, 2018

(I'm also having the same issue, watching this thread.)

@vlurgio
Copy link

vlurgio commented Nov 29, 2018

Would like to add here that the lack of quotes in actual strings when printed to console makes this almost impossible to detect as well. ['my', 'first', 'list'] looks like a list but is actually not. I was using unnest() on my csv trying to figure out why it wasn't working for almost an hour.

@jimhester
Copy link
Collaborator

While I can see the appeal, I think ultimately this would be better done as a post-processing step than something built into readr.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature a feature request or enhancement
Projects
None yet
Development

No branches or pull requests

9 participants