Update `group_by()` algorithm to utilize `vec_locate_sorted_groups()` #6018

DavisVaughan · 2021-09-16T18:12:18Z

Probably merge after #5942 when we are working on dplyr 1.1.0

Closes #5808
Closes #4406

This is also used in `equal_data_frame()`

DavisVaughan · 2021-10-05T15:36:39Z

NEWS.md

+* `group_by()` uses a new algorithm for computing and ordering groups. This is
+  often faster than the previous approach, especially when there are many
+  groups. In most cases, there should be no user visible changes. However,
+  character grouping columns are now ordered in the C locale rather than the
+  system locale, for performance. This change shows up in functions that use
+  the group data, such as `summarise()` or `group_split()`, where the order
+  of the results may have changed due to the usage of a different locale. If
+  the ordering of the results of a call to `summarise()` is important (i.e.
+  for constructing a table to be used in a report), you should explicitly call
+  `arrange()` after `summarise()` to sort as needed. If needed, the global
+  option `dplyr.legacy_group_by_locale` can be set to `TRUE` to revert to the
+  old algorithm, but this should be used extremely sparingly and will be
+  removed in a future version of dplyr.


Probably need to compact this NEWS bullet and link to the tidyup, like we did in the arrange() PR

DavisVaughan · 2021-10-05T15:37:14Z

tests/testthat/_snaps/bind.md

-    Message <simpleMessage>
+    Message <rlib_message_name_repair>
      New names:
-      * a -> a...1
-      * b -> b...2
-      * a -> a...3
-      * b -> b...4
+      * `a` -> `a...1`
+      * `b` -> `b...2`
+      * `a` -> `a...3`
+      * `b` -> `b...4`

 # bind_cols() handles unnamed list with name repair (#3402)

    Code
      df <- bind_cols(list(1, 2))
-    Message <simpleMessage>
+    Message <rlib_message_name_repair>
      New names:
-      * NA -> ...1
-      * NA -> ...2
+      * `` -> `...1`
+      * `` -> `...2`


These come from using dev vctrs

DavisVaughan added 6 commits September 15, 2021 16:53

Require r-lib/vctrs#1441

f45c411

Update snapshot tests for dev vctrs

5dd3ce9

Use new grouping algorithm in group_by()

4fe93d0

This is also used in `equal_data_frame()`

Test locale related details of group_by()

dd74749

NEWS bullet

561ed54

Document ordering behavior in group_by()

8e31edb

DavisVaughan commented Oct 5, 2021

View reviewed changes

DavisVaughan requested a review from hadley October 5, 2021 18:05

romainfrancois mentioned this pull request Dec 1, 2021

group_by() row order different when testing #6101

Closed

DavisVaughan mentioned this pull request May 9, 2022

use vctrs:::vec_order_locs() in group_by() and vctrs:::vec_order_radix() in arrange() #5808

Closed

DavisVaughan mentioned this pull request Jun 14, 2022

Update group_by() algorithm to utilize vec_locate_sorted_groups() #6297

Merged

DavisVaughan closed this Jun 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update `group_by()` algorithm to utilize `vec_locate_sorted_groups()` #6018

Update `group_by()` algorithm to utilize `vec_locate_sorted_groups()` #6018

DavisVaughan commented Sep 16, 2021 •

edited

Loading

DavisVaughan Oct 5, 2021

DavisVaughan Oct 5, 2021

Update group_by() algorithm to utilize vec_locate_sorted_groups() #6018

Update group_by() algorithm to utilize vec_locate_sorted_groups() #6018

Conversation

DavisVaughan commented Sep 16, 2021 • edited Loading

DavisVaughan Oct 5, 2021

Choose a reason for hiding this comment

DavisVaughan Oct 5, 2021

Choose a reason for hiding this comment

Update `group_by()` algorithm to utilize `vec_locate_sorted_groups()` #6018

Update `group_by()` algorithm to utilize `vec_locate_sorted_groups()` #6018

DavisVaughan commented Sep 16, 2021 •

edited

Loading