Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite nth(), first(), and last() using vctrs #6331

Merged
merged 6 commits into from
Jul 19, 2022

Conversation

DavisVaughan
Copy link
Member

@DavisVaughan DavisVaughan commented Jul 13, 2022

Revival of #4636
Technically closes tidyverse/funs#35

I am now quite confident that first() and friends should use [ / vec_slice() rather than [[ / vec_get(). I think the invariant for these functions should be that they return an object with:

  • Size 1
  • The same type as x

This guarantees they will be useful with summarise(), which is good because ?summarise already mentions them as "Useful Functions"

This change mainly affects lists, which now return a list of size 1 rather than the actual element of the list. But I think that change is correct - i.e. it feels weird for "first" to return something that could be size >1. It also affects data frames, which previously returned a single column but now return a single row (which again, feels more consistent).

I'll check revdeps to ensure this doesn't cause a big uproar, but I think these are relatively uncommon edge cases
"252be158-3a4a-4e70-a233-4a9a0a858637"

@DavisVaughan
Copy link
Member Author

DavisVaughan commented Jul 14, 2022

This one might actually be too extreme. It seems like a lot of people use first() and last() in a programmatic way outside of data frames. In particular, a number of people use it on lists and expect it to extract the first/last element of that list.

14 broken revdeps:

  • 7 due to first() or last() on a list, expecting to get list element
  • 5 due to first() or last() on a data frame, expecting to get a column. Should use pull() instead.
  • 1 due to stricter casting rules (can't cast character to double)
  • 1 unknown

See #6262 for details


We might be able to make it use vec_slice2()-like behavior instead. i.e.:

  • For most types, uses vec_slice() and then removes any names
  • For lists (i.e. vec_is_list() is TRUE), uses [[

It depends on how strongly we feel about the invariants of these functions (i.e. should they always return an object the same type as x with size 1? or are we ok with lists being different here?)

I guess first(list()) would continue to return NULL

@hadley
Copy link
Member

hadley commented Jul 17, 2022

Worth incorporating #6257?

@DavisVaughan
Copy link
Member Author

I thought about it but I think it'll be easier to me to keep it straight if I do that in a separate PR

This one is ready for review now though

@DavisVaughan DavisVaughan requested a review from hadley July 18, 2022 15:19
R/nth-value.R Outdated Show resolved Hide resolved
R/nth-value.R Outdated Show resolved Hide resolved
R/nth-value.R Outdated Show resolved Hide resolved
Because using `vec_slice()` breaks too many revdeps that use these functions on lists and expect to get list elements back
@DavisVaughan DavisVaughan merged commit 1b52611 into tidyverse:main Jul 19, 2022
@DavisVaughan DavisVaughan deleted the feature/nth-update branch July 19, 2022 16:14
lionel- added a commit to lionel-/confoundr that referenced this pull request Dec 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement first, last, nth
2 participants