Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update group_by() algorithm to utilize vec_locate_sorted_groups() #6018

Closed

Conversation

DavisVaughan
Copy link
Member

@DavisVaughan DavisVaughan commented Sep 16, 2021

Probably merge after #5942 when we are working on dplyr 1.1.0

Closes #5808
Closes #4406

Comment on lines +3 to +15
* `group_by()` uses a new algorithm for computing and ordering groups. This is
often faster than the previous approach, especially when there are many
groups. In most cases, there should be no user visible changes. However,
character grouping columns are now ordered in the C locale rather than the
system locale, for performance. This change shows up in functions that use
the group data, such as `summarise()` or `group_split()`, where the order
of the results may have changed due to the usage of a different locale. If
the ordering of the results of a call to `summarise()` is important (i.e.
for constructing a table to be used in a report), you should explicitly call
`arrange()` after `summarise()` to sort as needed. If needed, the global
option `dplyr.legacy_group_by_locale` can be set to `TRUE` to revert to the
old algorithm, but this should be used extremely sparingly and will be
removed in a future version of dplyr.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably need to compact this NEWS bullet and link to the tidyup, like we did in the arrange() PR

Comment on lines -5 to +19
Message <simpleMessage>
Message <rlib_message_name_repair>
New names:
* a -> a...1
* b -> b...2
* a -> a...3
* b -> b...4
* `a` -> `a...1`
* `b` -> `b...2`
* `a` -> `a...3`
* `b` -> `b...4`

# bind_cols() handles unnamed list with name repair (#3402)

Code
df <- bind_cols(list(1, 2))
Message <simpleMessage>
Message <rlib_message_name_repair>
New names:
* NA -> ...1
* NA -> ...2
* `` -> `...1`
* `` -> `...2`
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These come from using dev vctrs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

group_by performance: potential for easy and substantial improvement
1 participant