You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The problem of case_when()'s .default =.
I think that current logic of case_when() is flawed in terms of how it handles NA. case_when() is positioned as a "A general vectorised if-else" that "allows you to vectorise multiple if_else() statements". So, I expect case_when() to behave the same way as if_else()/ifelse() if i put minimum options:
> if(vec[3] == 1) {"one"} else {"not one"}
Error in if (vec[3] == 1) { : missing value where TRUE/FALSE needed
However, naive attemption to reproduce the same results as for if_else()/ifelse() using case_when() will fail:
> dplyr::case_when(
vec == 1 ~ "one", # first condition is analog of "if"
.default = "not one" # .default value is analog of "else"
)
[1] "one" "not one" "not one"
...due to NA are considered FALSE by all conditions, BUT .default = value will be returned.
Thus, NA values need some special treatment. For example, to mimic if_else()/ifelse() behaviour we need to set up is.na(vec) ~ NA.
There is a note on this trick in docs for case_when(), however, it is not on the surface: inside description of .default = argument and only in one code example. So, the function works not the way many users (including me) expect it to work and not consistent with if_else()/ifelse(). Finally, it contradicts the logic of NA handling in general: for missing value we don't know in which group a value behind the NA will fall and we cannot say that it belongs (or doesn't) to the .default group.
I think of this possible solution
in addition to .default = argument add .na = argument (the value that willl be returned in case of NA in condition) with default .na = NA that can be overwritten either by setting up .na = parameter or by is.na(ver) ~ value. The last one has prority over .na = argument.
The text was updated successfully, but these errors were encountered:
We considered adding a .missing argument for dplyr 1.1.0, but I ultimately found that it was too confusing and too hard to define in a way that always made sense. I have discussed this in detail here #6300 (comment), but we don't currently have any plans to add this argument back in. Ultimately explicitly handling missing values ends up being the clearest way to handle them.
The problem of
case_when()
's.default =
.I think that current logic of
case_when()
is flawed in terms of how it handlesNA
.case_when()
is positioned as a "A general vectorised if-else" that "allows you to vectorise multiple if_else() statements". So, I expectcase_when()
to behave the same way asif_else()
/ifelse()
if i put minimum options:this behaviour is the same for the base R
ifelse()
Non-vectorised if-else returns error for
NA
:However, naive attemption to reproduce the same results as for
if_else()
/ifelse()
usingcase_when(
) will fail:...due to
NA
are consideredFALSE
by all conditions, BUT.default =
value will be returned.Thus,
NA
values need some special treatment. For example, to mimicif_else()
/ifelse()
behaviour we need to set upis.na(vec) ~ NA
.There is a note on this trick in docs for
case_when(
), however, it is not on the surface: inside description of.default
= argument and only in one code example. So, the function works not the way many users (including me) expect it to work and not consistent withif_else()
/ifelse()
. Finally, it contradicts the logic ofNA
handling in general: for missing value we don't know in which group a value behind theNA
will fall and we cannot say that it belongs (or doesn't) to the.default
group.I think of this possible solution
in addition to
.default =
argument add.na
= argument (the value that willl be returned in case ofNA
in condition) with default.na = NA
that can be overwritten either by setting up.na
= parameter or byis.na(ver) ~ value
. The last one has prority over.na = argument
.The text was updated successfully, but these errors were encountered: