Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite case_when() using vec_case_when() #6300

Merged
merged 13 commits into from
Jul 1, 2022

Conversation

DavisVaughan
Copy link
Member

Closes #6261
Closes #6206
Closes #5106
Closes #6145
Closes #6225
Supersedes #6286

This is a more conservative update of case_when() vs what was done in #6286. There shouldn't be any breaking changes here (but ill run revdeps), and it should make everyone pretty happy.

The intention is to move vec_case_when() to vctrs and rewrite in C, but getting the semantics right in R is more important for now.

vec_case_when() will also be used to back if_else() and coalesce() for sure. I think it could also back na_if().


Important notes:

  • I have decided to keep the formula interface. I think my main complaint about case_when() is less about the formula interface, and more about TRUE ~. It also doesn't seem like we are going to add a multi-condition mutate_when() that would use an fcase() like interface, so there is less pressure to try and be consistent with something like that.

  • I have added a .default argument, with the intention of being a direct replacement for TRUE ~. In a future release, I would like to start deprecating usage of TRUE ~ so we can stop recycling all of the inputs against each other. Providing the .default argument and switching all our docs over to it is the first step towards that.

  • New .ptype and .size arguments, useful for predictable output

  • vec_case_when() now takes conditions and values lists, rather than ... with an fcase() interface. Providing two parallel lists has actually proven to be much more useful when programming with vec_case_when() because typically you have a user facing interface that collects those two lists.

  • After I move vec_case_when() to vctrs, I will come back and mention that if you want to program with case_when(), then you might want to look at vctrs::vec_case_when() instead.


The next section is a reference for future us if we ever think about changing how .default works with regards to NA.

An extremely important point is that .default works exactly the same as TRUE ~. We debated for a long time if .default should only be applied where all conditions are FALSE (rather than for FALSE or NA, as is currently done), and we also considered adding a .missing argument for handling the case where at least one condition returned NA and none returned TRUE. This ended up being more surprising in some cases than just handling the missing values manually. Ultimately there are too many ways that missing values can appear / disappear from the conditions, so I'm convinced that you have to have an explicit way to handle them that is specific to your usage of case_when()

For example, here is where changing .default and adding .missing would have worked well

x <- c(1, 2, NA)

# Here is a case where it is useful:
# NA propagated and `.default` doesn't handle it
# .missing would handle that NA
vec_case_when(
  list(x %% 2 == 0),
  list("odd"),
  .default = "even"
)
#> [1] "even" "odd"  NA

But it gives confusing results when you use %in% or complex conditions with & - these cases remove the missing values from the original input:

x <- c("a", "b", NA, "e")

# Here is an unexpected case with `%in%`
# NA turned into FALSE in the condition because of how match() works
# unexpected "consonant" in the output
vec_case_when(
  list(x %in% c("a", "e", "i", "o", "u")),
  list("vowel"),
  .default = "consonant",
  .missing = "missing"
)
#> [1] "vowel"     "consonant" "consonant" "vowel"
x <- c(1, 6)
y <- c(1, NA)

# Here is an unexpected case with `&`:
# unexpected `"high"` in the second position
case_when(
  x <= 2 & y <= 2 ~ "low",
  x <= 4 & y <= 4 ~ "med",
  .default = "high",
  .missing = "unknown"
)
#> [1] "low"  "high"

R/case_when.R Outdated Show resolved Hide resolved
R/case_when.R Show resolved Hide resolved
R/case_when.R Show resolved Hide resolved
R/case_when.R Show resolved Hide resolved
R/case_when.R Outdated Show resolved Hide resolved
R/vec-case-when.R Outdated Show resolved Hide resolved
tests/testthat/_snaps/case-when.md Show resolved Hide resolved
tests/testthat/test-vec-case-when.R Show resolved Hide resolved
@DavisVaughan DavisVaughan requested review from lionel- and hadley June 17, 2022 16:23
NEWS.md Outdated Show resolved Hide resolved
R/case_when.R Outdated Show resolved Hide resolved
#' `.default` participates in the computation of the common type with the RHS
#' inputs.
#'
#' If `NULL`, the default, a missing value will be used.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use .default = NA as default?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I think of it as "you didn't supply a .default" so nothing fills in the locations where a default value would otherwise be used. This makes sense to me if you think of the output as being the result of starting with vec_init(ptype, size) and then filling in the values of the cases. When you don't supply a .default, you just get the missing values that resulted from the vec_init() call

R/case_when.R Outdated Show resolved Hide resolved
R/case_when.R Outdated Show resolved Hide resolved
R/vec-case-when.R Outdated Show resolved Hide resolved
R/vec-case-when.R Outdated Show resolved Hide resolved
R/vec-case-when.R Outdated Show resolved Hide resolved
R/vec-case-when.R Show resolved Hide resolved
R/vec-case-when.R Outdated Show resolved Hide resolved
R/case_when.R Show resolved Hide resolved
R/vec-case-when.R Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants