Release dplyr 1.1.1 · tidyverse/dplyr

Mutating joins now warn about multiple matches much less often. At a high
level, a warning was previously being thrown when a one-to-many or
many-to-many relationship was detected between the keys of x and y, but is
now only thrown for a many-to-many relationship, which is much rarer and much
more dangerous than one-to-many because it can result in a Cartesian explosion
in the number of rows returned from the join (#6731, #6717).

We've accomplished this in two steps:
- multiple now defaults to "all", and the options of "error" and
  "warning" are now deprecated in favor of using relationship (see below).
  We are using an accelerated deprecation process for these two options
  because they've only been available for a few weeks, and relationship is
  a clearly superior alternative.
- The mutating joins gain a new relationship argument, allowing you to
  optionally enforce one of the following relationship constraints between the
  keys of x and y: "one-to-one", "one-to-many", "many-to-one", or
  "many-to-many".
  
  For example, "many-to-one" enforces that each row in x can match at
  most 1 row in y. If a row in x matches >1 rows in y, an error is
  thrown. This option serves as the replacement for multiple = "error".
  
  The default behavior of relationship doesn't assume that there is any
  relationship between x and y. However, for equality joins it will check
  for the presence of a many-to-many relationship, and will warn if it detects
  one.
This change unfortunately does mean that if you have set multiple = "all" to
avoid a warning and you happened to be doing a many-to-many style join, then
you will need to replace multiple = "all" with
relationship = "many-to-many" to silence the new warning, but we believe
this should be rare since many-to-many relationships are fairly uncommon.
Fixed a major performance regression in case_when(). It is still a little
slower than in dplyr 1.0.10, but we plan to improve this further in the future
(#6674).
Fixed a performance regression related to nth(), first(), and last()
(#6682).
Fixed an issue where expressions involving infix operators had an abnormally
large amount of overhead (#6681).
group_data() on ungrouped data frames is faster (#6736).
n() is a little faster when there are many groups (#6727).
pick() now returns a 1 row, 0 column tibble when ... evaluates to an
empty selection. This makes it more compatible with tidyverse recycling
rules in some
edge cases (#6685).
if_else() and case_when() again accept logical conditions that have
attributes (#6678).
arrange() can once again sort the numeric_version type from base R
(#6680).
slice_sample() now works when the input has a column named replace.
slice_min() and slice_max() now work when the input has columns named
na_rm or with_ties (#6725).
nth() now errors informatively if n is NA (#6682).
Joins now throw a more informative error when y doesn't have the same
source as x (#6798).
All major dplyr verbs now throw an informative error message if the input
data frame contains a column named NA or "" (#6758).
Deprecation warnings thrown by filter() now mention the correct package
where the problem originated from (#6679).
Fixed an issue where using <- within a grouped mutate() or summarise()
could cross contaminate other groups (#6666).
The compatibility vignette has been replaced with a more general vignette on
using dplyr in packages, vignette("in-packages") (#6702).
The developer documentation in ?dplyr_extending has been refreshed and
brought up to date with all changes made in 1.1.0 (#6695).
rename_with() now includes an example of using paste0(recycle0 = TRUE) to
correctly handle empty selections (#6688).
R >=3.5.0 is now explicitly required. This is in line with the tidyverse
policy of supporting the 5 most recent versions of
R.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dplyr 1.1.1