Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support hex digits in character classes and escaped characters in character class ranges #5532

Merged

Conversation

anthony-chang
Copy link
Contributor

@anthony-chang anthony-chang commented May 19, 2022

Fixes #4865, fixes #4505, closes #4518

This PR allows hexadecimal digits to be used in character classes and ranges in character classes. To support this, I have also generalized this to support all valid escaped characters in character class ranges.

A few notes:

  • I define a list of supported metacharacters in a character class: \, ^, -, ], +. These are the characters that may be escaped in a character class
  • In addition to the above list, I define \n, \r, \t, \f, \a, \b, \e as all the literals that use a backslash (https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html under "Characters")
  • When a pattern uses the hexadecimal representation of a metacharacter in a character class (eg. using \x5e instead of ^), Java treats this as an escaped metacharacter. Because of this, we cannot directly transpile hex digits to their unicode representation--we need to first check if it is a metacharacter.
    Example: [\x5ea] should transpile to [\^a]

Signed-off-by: Anthony Chang [email protected]

@anthony-chang
Copy link
Contributor Author

build

@anthony-chang anthony-chang self-assigned this May 19, 2022
@anthony-chang anthony-chang marked this pull request as ready for review May 19, 2022 15:39
@anthony-chang anthony-chang requested a review from andygrove May 19, 2022 15:39
@sameerz sameerz added the feature request New feature or request label May 20, 2022
andygrove
andygrove previously approved these changes May 25, 2022
Copy link
Contributor

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but would also be good to add tests for "replace" operations

Signed-off-by: Anthony Chang <[email protected]>
@anthony-chang
Copy link
Contributor Author

build

andygrove
andygrove previously approved these changes May 25, 2022
…into support-hex-character-class

Signed-off-by: Anthony Chang <[email protected]>
@anthony-chang
Copy link
Contributor Author

build

@andygrove andygrove merged commit 9b227b1 into NVIDIA:branch-22.06 May 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment