Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

case_when() doesn't return NA for NA by default #6773

Closed
Pozdniakov opened this issue Mar 3, 2023 · 1 comment
Closed

case_when() doesn't return NA for NA by default #6773

Pozdniakov opened this issue Mar 3, 2023 · 1 comment

Comments

@Pozdniakov
Copy link

The problem of case_when()'s .default =.
I think that current logic of case_when() is flawed in terms of how it handles NA. case_when() is positioned as a "A general vectorised if-else" that "allows you to vectorise multiple if_else() statements". So, I expect case_when() to behave the same way as if_else()/ifelse() if i put minimum options:

vec <- c(1L, 2L, NA_integer_)
> dplyr::if_else(vec == 1, "one", "not one")
[1] "one"     "not one" NA       

this behaviour is the same for the base R ifelse()

ifelse(vec == 1, "one", "not one")
[1] "one"     "not one" NA       

Non-vectorised if-else returns error for NA:

> if(vec[3] == 1) {"one"} else {"not one"}
Error in if (vec[3] == 1) { : missing value where TRUE/FALSE needed

However, naive attemption to reproduce the same results as for if_else()/ifelse() using case_when() will fail:

> dplyr::case_when(
    vec == 1 ~ "one", # first condition is analog of "if"
    .default = "not one" # .default value is analog of "else"
)
[1] "one"     "not one" "not one"

...due to NA are considered FALSE by all conditions, BUT .default = value will be returned.
Thus, NA values need some special treatment. For example, to mimic if_else()/ifelse() behaviour we need to set up is.na(vec) ~ NA.

There is a note on this trick in docs for case_when(), however, it is not on the surface: inside description of .default = argument and only in one code example. So, the function works not the way many users (including me) expect it to work and not consistent with if_else()/ifelse(). Finally, it contradicts the logic of NA handling in general: for missing value we don't know in which group a value behind the NA will fall and we cannot say that it belongs (or doesn't) to the .default group.

I think of this possible solution

in addition to .default = argument add .na = argument (the value that willl be returned in case of NA in condition) with default .na = NA that can be overwritten either by setting up .na = parameter or by is.na(ver) ~ value. The last one has prority over .na = argument.

@DavisVaughan
Copy link
Member

We considered adding a .missing argument for dplyr 1.1.0, but I ultimately found that it was too confusing and too hard to define in a way that always made sense. I have discussed this in detail here #6300 (comment), but we don't currently have any plans to add this argument back in. Ultimately explicitly handling missing values ends up being the clearest way to handle them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants