Skip to content

Commit

Permalink
Rewrite nth(), first(), and last() using vctrs (#6331)
Browse files Browse the repository at this point in the history
* Rewrite `nth()`, `first()`, and `last()` using vctrs

* NEWS bullet

* Switch to a `vec_slice2()`-like behavior

Because using `vec_slice()` breaks too many revdeps that use these functions on lists and expect to get list elements back

* NEWS bullet tweaks

* Apply suggestions from code review

* Typo
  • Loading branch information
DavisVaughan authored Jul 19, 2022
1 parent eaac641 commit 1b52611
Show file tree
Hide file tree
Showing 6 changed files with 379 additions and 91 deletions.
2 changes: 0 additions & 2 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -36,8 +36,6 @@ S3method(compute,data.frame)
S3method(copy_to,DBIConnection)
S3method(copy_to,src_local)
S3method(count,data.frame)
S3method(default_missing,data.frame)
S3method(default_missing,default)
S3method(distinct,data.frame)
S3method(distinct_,data.frame)
S3method(distinct_,grouped_df)
Expand Down
14 changes: 14 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,19 @@
# dplyr (development version)

* `first()`, `last()`, and `nth()` have been rewritten to use vctrs. This comes
with the following improvements (#6331):

* When used on a data frame, these functions now return a single row rather
than a single column. This is more consistent with the vctrs principle that
a data frame is generally treated as a vector of rows.

* The `default` is no longer "guessed", and will always automatically be set
to a missing value appropriate for the type of `x`.

* Fractional values of `n` are no longer truncated to integers, and will now
cause an error. For example, `nth(x, n = 2)` is fine, but
`nth(x, n = 2.5)` is now an error.

* `lag()` and `lead()` now cast `default` to the type of `x`, rather than taking
the common type. This ensures that these functions are type stable on `x`
(#6330).
Expand Down
152 changes: 108 additions & 44 deletions R/nth-value.R
Original file line number Diff line number Diff line change
@@ -1,25 +1,37 @@
#' Extract the first, last or nth value from a vector
#' Extract the first, last, or nth value from a vector
#'
#' These are straightforward wrappers around \code{\link{[[}}. The main
#' advantage is that you can provide an optional secondary vector that defines
#' the ordering, and provide a default value to use when the input is shorter
#' than expected.
#' These are useful helpers for extracting a single value from a vector. They
#' are guaranteed to return a meaningful value, even when the input is shorter
#' than expected. You can also provide an optional secondary vector that defines
#' the ordering.
#'
#' @details
#' For most vector types, `first(x)`, `last(x)`, and `nth(x, n)` work like
#' `x[[1]]`, `x[[length(x)]`, and `x[[n]]`, respectively. The primary exception
#' is data frames, where they instead retrieve rows, i.e. `x[1, ]`, `x[nrow(x),
#' ]`, and `x[n, ]`. This is consistent with the tidyverse/vctrs principle which
#' treats data frames as a vector of rows, rather than a vector of columns.
#'
#' @param x A vector
#' @param n For `nth()`, a single integer specifying the position.
#' Negative integers index from the end (i.e. `-1L` will return the
#' last value in the vector).
#' @param order_by An optional vector the same size as `x` used to determine the
#' order.
#' @param default A default value to use if the position does not exist in `x`.
#'
#' If `NULL`, the default, a missing value is used.
#'
#' If supplied, this must be a single value, which will be cast to the type of
#' `x`.
#'
#' When `x` is a list , `default` is allowed to be any value. There are no
#' type or size restrictions in this case.
#'
#' If a double is supplied, it will be silently truncated.
#' @param order_by An optional vector used to determine the order
#' @param default A default value to use if the position does not exist in
#' the input. This is guessed by default for base vectors, where a
#' missing value of the appropriate type is returned, and for lists, where
#' a `NULL` is return.
#' @return
#' If `x` is a list, a single element from that list. Otherwise, a vector the
#' same type as `x` with size 1.
#'
#' For more complicated objects, you'll need to supply this value.
#' Make sure it is the same type as `x`.
#' @return A single value. `[[` is used to do the subsetting.
#' @export
#' @examples
#' x <- 1:10
Expand All @@ -31,62 +43,114 @@
#' nth(x, 1)
#' nth(x, 5)
#' nth(x, -2)
#'
#' # `first()` and `last()` are often useful in `summarise()`
#' df <- tibble(x = x, y = y)
#' df %>%
#' summarise(
#' across(x:y, first, .names = "{col}_first"),
#' y_last = last(y)
#' )
#'
#' # Selecting a position that is out of bounds returns a default value
#' nth(x, 11)
#' nth(x, 0)
#'
#' # This out of bounds behavior also applies to empty vectors
#' first(integer())
#'
#' # You can customize the default value with `default`
#' nth(x, 11, default = -1L)
#' first(integer(), default = 0L)
#'
#' # `order_by` provides optional ordering
#' last(x)
#' # Second argument provides optional ordering
#' last(x, y)
#' last(x, order_by = y)
#'
#' # These functions always return a single value
#' first(integer())
nth <- function(x, n, order_by = NULL, default = default_missing(x)) {
if (length(n) != 1 || !is.numeric(n)) {
abort("`n` must be a single integer.")
#' # For data frames, these select entire rows
#' df <- tibble(a = 1:5, b = 6:10)
#' first(df)
#' nth(df, 4)
nth <- function(x, n, order_by = NULL, default = NULL) {
size <- vec_size(x)

vec_assert(n, size = 1L, arg = "n")
n <- vec_cast(n, to = integer(), x_arg = "n")

if (!is.null(order_by)) {
vec_assert(order_by, size = size, arg = "order_by")
}
n <- trunc(n)

if (n == 0 || n > length(x) || n < -length(x)) {
return(default)
default <- check_nth_default(default, x = x)

if (n < 0L) {
# Negative values index from RHS
n <- size + n + 1L
}

# Negative values index from RHS
if (n < 0) {
n <- length(x) + n + 1
if (n <= 0L || n > size) {
return(default)
}

if (is.null(order_by)) {
x[[n]]
} else {
x[[ order(order_by)[[n]] ]]
if (!is.null(order_by)) {
order <- vec_order_base(order_by)
n <- order[[n]]
}

vec_slice2(x, n)
}

#' @export
#' @rdname nth
first <- function(x, order_by = NULL, default = default_missing(x)) {
first <- function(x, order_by = NULL, default = NULL) {
nth(x, 1L, order_by = order_by, default = default)
}

#' @export
#' @rdname nth
last <- function(x, order_by = NULL, default = default_missing(x)) {
last <- function(x, order_by = NULL, default = NULL) {
nth(x, -1L, order_by = order_by, default = default)
}

default_missing <- function(x) {
UseMethod("default_missing")
check_nth_default <- function(default, x, ..., error_call = caller_env()) {
check_dots_empty0(...)

if (vec_is_list(x)) {
# Very special behavior for lists, since we use `[[` on them.
# Valid to use any `default` here (even non-vectors).
# And `default = NULL` is the correct default `default` for lists.
return(default)
}

if (is.null(default)) {
return(vec_init(x))
}

vec_assert(default, size = 1L, arg = "default", call = error_call)

default <- vec_cast(
x = default,
to = x,
x_arg = "default",
to_arg = "x",
call = error_call
)

default
}

#' @export
default_missing.default <- function(x) {
if (!is.object(x) && is.list(x)) {
NULL
vec_slice2 <- function(x, i) {
# Our unimplemented vctrs equivalent of `[[`
# https://github.com/r-lib/vctrs/pull/1228/

i <- vec_as_location2(i, vec_size(x))

if (vec_is_list(x)) {
out <- .subset2(x, i)
} else {
x[NA_real_]
out <- vec_slice(x, i)
out <- vec_set_names(out, NULL)
}
}

#' @export
default_missing.data.frame <- function(x) {
rep(NA, nrow(x))
out
}
72 changes: 50 additions & 22 deletions man/nth.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

57 changes: 52 additions & 5 deletions tests/testthat/_snaps/nth-value.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,56 @@
# nth() gives meaningful error message (#5466)
# `default` must be size 1 (when not used with lists)

Code
(expect_error(nth(1:10, "x")))
Output
<error/rlang_error>
nth(1L, n = 2L, default = 1:2)
Condition
Error in `nth()`:
! `n` must be a single integer.
! `default` must have size 1, not size 2.

# `default` is cast to the type of `x` (when not used with lists)

Code
nth("x", 2, default = 2)
Condition
Error in `nth()`:
! Can't convert `default` <double> to match type of `x` <character>.

# `n` is validated (#5466)

Code
nth(1:10, n = "x")
Condition
Error in `nth()`:
! Can't convert `n` <character> to <integer>.

---

Code
nth(1:10, n = 1:2)
Condition
Error in `nth()`:
! `n` must have size 1, not size 2.

# `x` must be a vector

Code
nth(environment(), 1L)
Condition
Error in `vec_size()`:
! `x` must be a vector, not an environment.

# `order_by` must be the same size as `x`

Code
nth(1:5, n = 1L, order_by = 1:2)
Condition
Error in `nth()`:
! `order_by` must have size 5, not size 2.

---

Code
nth(1:5, n = 6L, order_by = 1:2)
Condition
Error in `nth()`:
! `order_by` must have size 5, not size 2.

Loading

0 comments on commit 1b52611

Please sign in to comment.