dplyr 1.1.1
-
Mutating joins now warn about multiple matches much less often. At a high
level, a warning was previously being thrown when a one-to-many or
many-to-many relationship was detected between the keys ofx
andy
, but is
now only thrown for a many-to-many relationship, which is much rarer and much
more dangerous than one-to-many because it can result in a Cartesian explosion
in the number of rows returned from the join (#6731, #6717).We've accomplished this in two steps:
-
multiple
now defaults to"all"
, and the options of"error"
and
"warning"
are now deprecated in favor of usingrelationship
(see below).
We are using an accelerated deprecation process for these two options
because they've only been available for a few weeks, andrelationship
is
a clearly superior alternative. -
The mutating joins gain a new
relationship
argument, allowing you to
optionally enforce one of the following relationship constraints between the
keys ofx
andy
:"one-to-one"
,"one-to-many"
,"many-to-one"
, or
"many-to-many"
.For example,
"many-to-one"
enforces that each row inx
can match at
most 1 row iny
. If a row inx
matches >1 rows iny
, an error is
thrown. This option serves as the replacement formultiple = "error"
.The default behavior of
relationship
doesn't assume that there is any
relationship betweenx
andy
. However, for equality joins it will check
for the presence of a many-to-many relationship, and will warn if it detects
one.
This change unfortunately does mean that if you have set
multiple = "all"
to
avoid a warning and you happened to be doing a many-to-many style join, then
you will need to replacemultiple = "all"
with
relationship = "many-to-many"
to silence the new warning, but we believe
this should be rare since many-to-many relationships are fairly uncommon. -
-
Fixed a major performance regression in
case_when()
. It is still a little
slower than in dplyr 1.0.10, but we plan to improve this further in the future
(#6674). -
Fixed a performance regression related to
nth()
,first()
, andlast()
(#6682). -
Fixed an issue where expressions involving infix operators had an abnormally
large amount of overhead (#6681). -
group_data()
on ungrouped data frames is faster (#6736). -
n()
is a little faster when there are many groups (#6727). -
pick()
now returns a 1 row, 0 column tibble when...
evaluates to an
empty selection. This makes it more compatible with tidyverse recycling
rules in some
edge cases (#6685). -
if_else()
andcase_when()
again accept logical conditions that have
attributes (#6678). -
arrange()
can once again sort thenumeric_version
type from base R
(#6680). -
slice_sample()
now works when the input has a column namedreplace
.
slice_min()
andslice_max()
now work when the input has columns named
na_rm
orwith_ties
(#6725). -
nth()
now errors informatively ifn
isNA
(#6682). -
Joins now throw a more informative error when
y
doesn't have the same
source asx
(#6798). -
All major dplyr verbs now throw an informative error message if the input
data frame contains a column namedNA
or""
(#6758). -
Deprecation warnings thrown by
filter()
now mention the correct package
where the problem originated from (#6679). -
Fixed an issue where using
<-
within a groupedmutate()
orsummarise()
could cross contaminate other groups (#6666). -
The compatibility vignette has been replaced with a more general vignette on
using dplyr in packages,vignette("in-packages")
(#6702). -
The developer documentation in
?dplyr_extending
has been refreshed and
brought up to date with all changes made in 1.1.0 (#6695). -
rename_with()
now includes an example of usingpaste0(recycle0 = TRUE)
to
correctly handle empty selections (#6688). -
R >=3.5.0 is now explicitly required. This is in line with the tidyverse
policy of supporting the 5 most recent versions of
R.