-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NA handling in slice_min / slice_max #6177
Comments
Reprex: library(dplyr, warn.conflicts = FALSE)
df <- tibble(a = 1:3, b = c(10, NA, 12))
df |> slice_min(b, n = 3)
#> # A tibble: 2 × 2
#> a b
#> <int> <dbl>
#> 1 1 10
#> 2 3 12 Created on 2022-04-15 by the reprex package (v2.0.1) I think this is just a documentation issue as an |
Combining with #6347: Currently we have an inconsistency depending on the library(dplyr, warn.conflicts = FALSE)
df <- tribble(
~id, ~x,
1, NA,
1, NA,
2, 1,
2, NA,
3, 1,
3, 1,
3, NA
)
df |> group_by(id) |> slice_min(x, n = 1, with_ties = TRUE)
#> # A tibble: 3 × 2
#> # Groups: id [2]
#> id x
#> <dbl> <dbl>
#> 1 2 1
#> 2 3 1
#> 3 3 1
df |> group_by(id) |> slice_min(x, n = 1, with_ties = FALSE)
#> # A tibble: 3 × 2
#> # Groups: id [3]
#> id x
#> <dbl> <dbl>
#> 1 1 NA
#> 2 2 1
#> 3 3 1 Created on 2022-07-21 by the reprex package (v2.0.1) What should happen when there are some |
At a minimum, it feels like we also need an |
If we think about these functions strictly, if there's even a single But that behaviour seems pretty annoying/useless, and it would be inconsistent with |
That makes sense to me. So the output of The next question is, are rows with a missing value considered ties? i.e., what should this return?
|
Yeah, I think we'd consider I think these are the main cases: # Sufficient unique values; get n rows in output
df <- tibble(x = c(1, 2, 3, NA, NA))
df %>% slice_min(x, n = 2)
# Insufficient values; get fewer than n rows in output
df <- tibble(x = c(1))
df %>% slice_min(x, n = 2)
# Insufficient non-missing values, get NAs in output
df <- tibble(x = c(1, NA))
df %>% slice_min(x, n = 2, na.rm = TRUE)
# Ties; get more than n rows in output
df <- tibble(x = c(1, 2, 2))
df %>% slice_min(x, n = 2)
df <- tibble(x = c(1, NA, NA))
df %>% slice_min(x, n = 2)
# Ties and insufficient values happen cancel out
df <- tibble(x = c(1, 1))
df %>% slice_min(x, n = 2) |
This comment was marked as resolved.
This comment was marked as resolved.
Ooops, I think I messed that up. I meant: # Insufficient values; get fewer than n rows in output
df <- tibble(x = c(1))
df %>% slice_min(x, n = 2)
# Insufficient non-missing values, get fewer than n rows in output
df <- tibble(x = c(1, NA))
df %>% slice_min(x, n = 2, na.rm = TRUE)
# Insufficient non-missing values, get NAs in output
df <- tibble(x = c(1, NA))
df %>% slice_min(x, n = 2) |
slice_min
andslice_max
drop rows where the value in theorder_by
column isNA
, returning fewer than the requested number of rows ifn
is large enough.It's unclear to me whether this is a bug or just intended behavior that requires documenting.
The text was updated successfully, but these errors were encountered: