-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a .locale
argument to arrange()
and use radix ordering
#5868
Closed
DavisVaughan
wants to merge
19
commits into
tidyverse:master
from
DavisVaughan:feature/arrange-radix
Closed
Add a .locale
argument to arrange()
and use radix ordering
#5868
DavisVaughan
wants to merge
19
commits into
tidyverse:master
from
DavisVaughan:feature/arrange-radix
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
hadley
reviewed
Apr 28, 2021
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM — we'll just need to leave this sit until we figure out a unified strategy.
lionel-
reviewed
Apr 29, 2021
Co-authored-by: Lionel Henry <[email protected]>
So now we see the actual code, along with the error output and class
lionel-
reviewed
Apr 30, 2021
This constrains the UI to start with, instead encouraging users to apply transformations in the `arrange()` call itself
Falls back to C with a warning otherwise. Explicitly specifying `"C"` is now also an option.
Updated so that:
|
It seems well covered by the argument documentation now
Even though these are superseded, this should help ease the transition a little, since without this argument it would be difficult to choose a different locale, and if stringi wasn't installed then you'd unconditionally get a warning you couldn't silence
Updated so that:
|
Closing in favor of the cleaner #5942 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Closes #4962
Closes #5090
Part of #5808
Requires this vctrs PR r-lib/vctrs#1375
The above vctrs PR switches
vec_order()
to always use radix ordering. This propagates through to dplyr in 3 places:arrange()
group_by()
with_order()
This PR tackles
arrange()
by providing a new.locale
argument with 3 possibilities:NULL
, the default, for C locale"en_US"
, to use a specific locale, backed by and requiring stringistringi::stri_sort_key()
, ortolower()
)This PR is currently focused on the implementation, I've left TODOs for the news bullet and for documentation of locale handling.
The snapshot tests are a little noisy. Dev vctrs requires dev rlang, which tweaked how deparsing with
~
works, changing a few snapshot tests here.Build errors seem to be from the fact that rlang now exports ellipsis functions, causing some warnings
This would be a meaningful breaking change, as the system locale is no longer being respected. However, there are multiple benefits from this change:
For English, the biggest change is that uppercase letters now sort before any lowercase letters. Previously it used a natural ordering of
c("a", "A", "b", "B")
.For other languages, sorting directly in the C locale is often not very meaningful, so they would have to supply a locale identifier.