Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mid-pipe assertions #41

Open
r2evans opened this issue Jun 26, 2017 · 3 comments
Open

mid-pipe assertions #41

r2evans opened this issue Jun 26, 2017 · 3 comments

Comments

@r2evans
Copy link

r2evans commented Jun 26, 2017

Often when troubleshooting a "long" %>% pipe, if I need to test assertions on the data, I need to interrupt the pipe (if no grouping present) or use a do() block (both with/without grouping).

library(dplyr)
cyls <- 6
mtcars %>%
  filter(cyl == cyls) %>%
  group_by(vs) %>%
  summarize(z = max(density(mpg)$y))

works as one might expect. If you run this with cyls <- 4, though, you'll see that vs=0 only contains one row, and errors out.

cyls <- 4
mtcars %>%
  filter(cyl == cyls) %>%
  group_by(vs) %>%
  mutate(z = max(density(mpg)$y))
# Error in mutate_impl(.data, dots) : 
#   need at least 2 points to select a bandwidth automatically

In order to assert that sufficient data is present, you either need to use a do() block or break up the pipe:

library(assertthat)
cyls <- 4
mtcars %>%
  filter(cyl == cyls) %>%
  group_by(vs) %>%
  do({
    assert_that(
      length(na.omit(.$mpg)) > 1,
      msg = "I cannot grok the data"
    )
    .
  }) %>%
  summarize(z = max(density(mpg)$y))
# Error: I cannot grok the data

It would be nice to be able to test the assertion mid-pipe:

cyls <- 4
mtcars %>%
  filter(cyl == cyls) %>%
  group_by(vs) %>%
  assert_pipe_stop(
    length(na.omit(mpg)) > 1,
    .msg = "I cannot grok the data"
  ) %>%
  summarize(z = max(density(mpg)$y))

Granted, in this contrived example, the error should be sufficient, but it's not hard to consider longer pipelines where calculation should not continue without verified conditions.

I think assertion-companion functions such as assert_pipe_stop and perhaps assert_pipe_warning might be useful. I think it makes more sense to extend assertthat to be pipe-aware vice adding assertions to dplyr or another of the tidyverse packages.

Thoughts?

I'm willing to work on a PR, though admittedly I'm not as proficient at NSE, where these functions would heavily reside.

@Zedseayou
Copy link

I'd find this really useful! I honestly still have trouble using the do workaround as well, so being able to stop a pipe execution if a condition isn't met would be great. I tried using assert_that directly with the %T>% operator from magrittr in the hopes that it would allow the pipe to continue, but wasn't able to get it working. Will poke around some more.

@ArtemSokolov
Copy link

Consider using assertr for this functionality: https://cran.r-project.org/web/packages/assertr/vignettes/assertr.html

@PhilvanKleur
Copy link

Isn't this what the env argument of assert_that() is for?

Below, I assign to global variable d . I then show three pipes with an assert_that() call embedded with the tee operator, and a mutate() on the tibble at the end. One assert_that() should return TRUE; one tests a column of the tibble and should fail; the other tests d and should fail. They seem to work as intended.

> d <- 2
> tibble( a=1, b=2 ) %T>% assert_that( a==1 & d==2, env= . ) %>% mutate( b=3 )
# A tibble: 1 x 2
a b
<dbl> <dbl>
1 1 3
> tibble( a=1, b=2 ) %T>% assert_that( a==11 & d==2, env= . ) %>% mutate( b=3 )
Error: a == 11 & d == 2 is not TRUE
> tibble( a=1, b=2 ) %T>% assert_that( a==1 & d==22, env= . ) %>% mutate( b=3 )
Error: a == 1 & d == 22 is not TRUE
>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants