Skip to content

dplyr 1.1.1

Compare
Choose a tag to compare
@hadley hadley released this 22 Mar 13:19
  • Mutating joins now warn about multiple matches much less often. At a high
    level, a warning was previously being thrown when a one-to-many or
    many-to-many relationship was detected between the keys of x and y, but is
    now only thrown for a many-to-many relationship, which is much rarer and much
    more dangerous than one-to-many because it can result in a Cartesian explosion
    in the number of rows returned from the join (#6731, #6717).

    We've accomplished this in two steps:

    • multiple now defaults to "all", and the options of "error" and
      "warning" are now deprecated in favor of using relationship (see below).
      We are using an accelerated deprecation process for these two options
      because they've only been available for a few weeks, and relationship is
      a clearly superior alternative.

    • The mutating joins gain a new relationship argument, allowing you to
      optionally enforce one of the following relationship constraints between the
      keys of x and y: "one-to-one", "one-to-many", "many-to-one", or
      "many-to-many".

      For example, "many-to-one" enforces that each row in x can match at
      most 1 row in y. If a row in x matches >1 rows in y, an error is
      thrown. This option serves as the replacement for multiple = "error".

      The default behavior of relationship doesn't assume that there is any
      relationship between x and y. However, for equality joins it will check
      for the presence of a many-to-many relationship, and will warn if it detects
      one.

    This change unfortunately does mean that if you have set multiple = "all" to
    avoid a warning and you happened to be doing a many-to-many style join, then
    you will need to replace multiple = "all" with
    relationship = "many-to-many" to silence the new warning, but we believe
    this should be rare since many-to-many relationships are fairly uncommon.

  • Fixed a major performance regression in case_when(). It is still a little
    slower than in dplyr 1.0.10, but we plan to improve this further in the future
    (#6674).

  • Fixed a performance regression related to nth(), first(), and last()
    (#6682).

  • Fixed an issue where expressions involving infix operators had an abnormally
    large amount of overhead (#6681).

  • group_data() on ungrouped data frames is faster (#6736).

  • n() is a little faster when there are many groups (#6727).

  • pick() now returns a 1 row, 0 column tibble when ... evaluates to an
    empty selection. This makes it more compatible with tidyverse recycling
    rules
    in some
    edge cases (#6685).

  • if_else() and case_when() again accept logical conditions that have
    attributes (#6678).

  • arrange() can once again sort the numeric_version type from base R
    (#6680).

  • slice_sample() now works when the input has a column named replace.
    slice_min() and slice_max() now work when the input has columns named
    na_rm or with_ties (#6725).

  • nth() now errors informatively if n is NA (#6682).

  • Joins now throw a more informative error when y doesn't have the same
    source as x (#6798).

  • All major dplyr verbs now throw an informative error message if the input
    data frame contains a column named NA or "" (#6758).

  • Deprecation warnings thrown by filter() now mention the correct package
    where the problem originated from (#6679).

  • Fixed an issue where using <- within a grouped mutate() or summarise()
    could cross contaminate other groups (#6666).

  • The compatibility vignette has been replaced with a more general vignette on
    using dplyr in packages, vignette("in-packages") (#6702).

  • The developer documentation in ?dplyr_extending has been refreshed and
    brought up to date with all changes made in 1.1.0 (#6695).

  • rename_with() now includes an example of using paste0(recycle0 = TRUE) to
    correctly handle empty selections (#6688).

  • R >=3.5.0 is now explicitly required. This is in line with the tidyverse
    policy of supporting the 5 most recent versions of
    R
    .