Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement vec_arg_min and vec_arg_max #86

Open
hadley opened this issue Sep 19, 2018 · 6 comments
Open

Implement vec_arg_min and vec_arg_max #86

hadley opened this issue Sep 19, 2018 · 6 comments
Labels
feature a feature request or enhancement numeric-arith

Comments

@hadley
Copy link
Member

hadley commented Sep 19, 2018

And use in min() and max() methods

@lionel- lionel- added feature a feature request or enhancement numeric-arith labels Apr 20, 2020
@DavisVaughan
Copy link
Member

DavisVaughan commented Jan 22, 2022

I now think it would be useful to have:

# error if size 0, requires size >0 to be able to return a single location back
vec_arg_min(x)
vec_arg_max(x)

# Returns a size 1 value that represents the maximum/minimum value for a particular ptype
# - generic
# - default would cast Inf/-Inf to the type of x
# - clock would override this to return the max/min possible date-time values
# - data frames would map(x, vec_ptype_maximum) over the columns to get a 1 row result
vec_ptype_maximum(x)
vec_ptype_minimum(x)

vec_min(x)
vec_max(x) 

# the above two are implemented as:
vec_min <- function(x) {
  if (vec_size(x)) {
    vec_slice(x, vec_arg_min(x))
  } else {
    vec_ptype_minimum(x)
  }
}

I'm currently at a place where I'm using min() and max() in a "generic" way and would need them to even be generic over data frames, so vec_min/max() would be useful.

I'm not sure what the vec_ptype_minimum() of a character vector is. min(character()) returns NA_character_ and is documented to do so, but even the docs admit that that is strange. Maybe that should be an error in our vec_ptype_minimum.character method.

@DavisVaughan
Copy link
Member

I also like that Python's min() function has a default argument
https://docs.python.org/3/library/functions.html#min

So maybe its something like:

vec_min <- function(x, ..., na_rm = FALSE, empty = NULL) {
  check_dots_empty0(...)
  
  if (na_rm) {
    x <- vec_slice(x, !vec_equal_na(x))
  }
  
  if (is.null(empty)) {
    empty <- vec_ptype_maximum(x)
  } else {
    empty <- vec_cast(empty, x)
    vec_assert(empty, size = 1L)
  }
  
  if (vec_is_empty(x)) {
    empty
  } else {
    vec_slice(x, vec_arg_min(x))
  }
}

I think the empty argument would really come in handy with group_by() + summarize(). My wife just found herself in a situation where she wanted to compute the min date per group, but many of her dates were missing. She wanted to remove the missing dates and still compute the min on any data that was left, but she wanted to retain an NA if the group was entirely composed of NAs (rather than getting an Inf date).

suppressPackageStartupMessages({
  library(dplyr)
  library(vctrs)
})
#> Warning: package 'dplyr' was built under R version 4.1.2

df <- tibble(g = c(1, 1, 2), date = new_date(c(0, NA, NA)))

df_min <- df %>%
  group_by(g) %>%
  summarise(date = min(date, na.rm = TRUE))
#> Warning in min.default(structure(NA_real_, class = "Date"), na.rm = TRUE): no
#> non-missing arguments to min; returning Inf

# the print method lies!
df_min
#> # A tibble: 2 × 2
#>       g date      
#>   <dbl> <date>    
#> 1     1 1970-01-01
#> 2     2 NA

# EW!
unclass(df_min$date)
#> [1]   0 Inf
is.na(df_min$date)
#> [1] FALSE FALSE

# this was actually what she wanted:
df_min <- df %>%
  group_by(g) %>%
  summarise(date = vec_min(date, na_rm = TRUE, empty = NA))

@DavisVaughan
Copy link
Member

vec_min(x, empty = 1L) would probably be useful for tidyverse/dplyr#6167 (comment)

@lionel-
Copy link
Member

lionel- commented Sep 12, 2022

Should vec_arg_min() be vec_min_loc()?

@DavisVaughan
Copy link
Member

Probably vec_locate_min() but yea i dont like the python style argmin naming scheme

@DavisVaughan
Copy link
Member

Another example where this would be helpful tidyverse/dplyr#6703

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature a feature request or enhancement numeric-arith
Projects
None yet
Development

No branches or pull requests

3 participants