Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Version of dplyr Breaks tbl_json dplyr verbs #97

Closed
colearendt opened this issue May 6, 2017 · 10 comments
Closed

New Version of dplyr Breaks tbl_json dplyr verbs #97

colearendt opened this issue May 6, 2017 · 10 comments

Comments

@colearendt
Copy link
Owner

The standard evaluation versions of dplyr verbs will be deprecated in a coming release of dplyr, it seems. See doc for reference. Lines 474 to 496.

If so, the method of implementing the dplyr verbs should be thought through - probably ahead of or in step with the dplyr release.

Thoughts are to use the regular forms (filter, mutate, etc.) since the SE forms will no longer be called, and then deprecate the SE forms consistently with dplyr. I have not tested yet, though.

A simple example to show the JSON is not filtered. The problems will pervade through many of the other functions, though, too. I.e. spread_all fails, likely due to different behavior.

devtools::install_github('tidyverse/dplyr')
library('dplyr')

devtools::load_all()

x <- commits %>% gather_array()

length(x$array.index) ## 30
length(attr(x,'JSON')) ## 30


## The JSON is not filtered as it should be
xf <- x %>% filter(array.index==1)

length(xf$array.index) ## 1
length(attr(xf,'JSON')) ## 30

## Fails 
## x %>% enter_object('commit') %>% spread_all
@colearendt
Copy link
Owner Author

Started working on this, but a change of attributes in the dev version of filter is causing trouble for passing tests. See tidyverse/dplyr#2772 for discussion / resolution of this issue.

@colearendt
Copy link
Owner Author

Since the SE forms are no longer called behind the scenes, I simply added the NSE forms to wrap_dplyr_verb.

Have not tested this with the tidyeval framework, yet.

@colearendt
Copy link
Owner Author

Note that tbl_df is soft deprecated in the new release of dplyr per the NEWS.md, so this should be changed to as_tibble. I am not a fan of the extra typing or the inconsistency with the attr(.,'class')... but it seems the decision has already been made.

@colearendt
Copy link
Owner Author

colearendt commented Jun 16, 2017

@MarkEdmondson1234 can you test your issue with devtools::install_github('jeremystan/tidyjson',ref='f6f13f4'). That is the development version where I believe I have a fix, and it would be helpful to know if it fixes before going through the CRAN release.

@clesiemo3
Copy link

@colearendt for what it's worth I ran into a similar issue caused by https://github.com/jeremystan/tidyjson/blob/master/R/spread_all.R#L77 where it turned it into a data frame instead of json_tbl and later had trouble down the line. Installing the ref above fixed my issue at work. We have been using the master branch for spread_all() support because some internal R packages utilize it. +1 from me!

@gaiusjaugustus
Copy link

Continuing discussion from sailthru/tidyjson#58

I have a json I downloaded from GDC, and am trying to flatten it to a dataframe. Was hoping to "simply" use tidyjson's verbs (gather_array, enter_object, and spread_values) to manually go through all the levels of the json and flatten them to a dataframe. Got the dplyr error...and it persisted even when I tried downgrading to dplyr 0.5.0 (which I can't stay at because other pipelines require dplyr 0.7.1).

@colearendt
Copy link
Owner Author

colearendt commented Dec 16, 2017

Hey @gaiusjaugustus. Unfortunately, I will be unable to help without a direct example (best practices for sharing minimal reproducible examples here ) that I can test. My only recommendation would be to try out my PR with the ref above or with the latest devtools::install_github("jeremystan/tidyjson","#103"). Unfortunately, until I am able to get in touch with the package maintainer, I cannot do a whole lot besides updating that PR. I would love to know if it is not working! The verbs you are working with should "simply" work (in an ideal world).

If you are struggling to get a reprex working (and if the data need not be secured), you could also try your hand at creating a RStudio cloud project that I can fork. (Note that RStudio Cloud is currently in alpha)

@gaiusjaugustus
Copy link

Just as an update, because the data is pretty complex (several layers deep) and I'm unsure where the problem is, and because the data is not allowed to be shared, creating a reproducible example was going to take some time. In the meantime, I was able to find a way to get the data I need in a flat file (from the provider of the data). If I run across this problem again in the future, I'll definitely update with more info.

Thanks for your time.

@colearendt
Copy link
Owner Author

Sure thing! Thanks for the update

@colearendt
Copy link
Owner Author

Should be resolved in #108 and #109

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants