Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable regular expression support based on whether UTF-8 is in the current locale #5776

Merged
merged 30 commits into from
Jul 18, 2022
Merged
Changes from 1 commit
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
a39b36b
Regular expression support handling via UTF-8 in the locale
NVnavkumar Jun 7, 2022
142bbca
Merge branch 'branch-22.08' into regexp_unicode_fix
NVnavkumar Jun 14, 2022
4b88f8c
Fixup some tests, including a typo in transpiler unicode fuzz test
NVnavkumar Jun 14, 2022
80062c0
Update fuzz tests to not include \b or \B in fuzz testing because of
NVnavkumar Jun 21, 2022
17612f5
Fix issue in fuzz tests with \Z followed by $
NVnavkumar Jun 21, 2022
e141562
Fix issue with word boundaries and negative character classes \D,\W,\S
NVnavkumar Jun 21, 2022
598634b
Add reference to issue regarding \b and \B unicode issue
NVnavkumar Jun 21, 2022
2919fac
Fall back to CPU when negated character class is next to word boundary
NVnavkumar Jun 22, 2022
91c5407
Merge branch 'branch-22.08' of github.com:NVIDIA/spark-rapids into re…
NVnavkumar Jun 22, 2022
e1f4fbe
Add \H and \V to fallback scenario with word boundaries
NVnavkumar Jun 23, 2022
f217eed
Merge branch 'branch-22.08' of github.com:NVIDIA/spark-rapids into re…
NVnavkumar Jun 30, 2022
6cd302b
Merge branch 'branch-22.08' of github.com:NVIDIA/spark-rapids into re…
NVnavkumar Jul 6, 2022
963f245
remove this test since it was removed in the upstream branch
NVnavkumar Jul 6, 2022
dc9d1be
move word boundary fuzz testing logic to a separate flag skipUnicodeI…
NVnavkumar Jul 6, 2022
6ea8e99
Merge branch 'branch-22.08' of github.com:NVIDIA/spark-rapids into re…
NVnavkumar Jul 7, 2022
2f4536e
Update the jenkins scripts here to set the locale
NVnavkumar Jul 7, 2022
a3d2d9f
need to export LC_ALL in mvn_verify stage here
NVnavkumar Jul 8, 2022
1453387
add comment for LC_ALL
NVnavkumar Jul 8, 2022
4d33f85
Merge branch 'branch-22.08' of github.com:NVIDIA/spark-rapids into re…
NVnavkumar Jul 11, 2022
da12d28
Regexp compatibility doc update
NVnavkumar Jul 11, 2022
2724802
Merge branch 'branch-22.08' of github.com:NVIDIA/spark-rapids into re…
NVnavkumar Jul 11, 2022
84139c2
Update scalatests and premerge build script
NVnavkumar Jul 12, 2022
889ba7a
update build scripts to test regexp separately from other tests becau…
NVnavkumar Jul 12, 2022
c1e184c
Merge branch 'branch-22.08' of github.com:NVIDIA/spark-rapids into re…
NVnavkumar Jul 12, 2022
6b21fcb
Feedback: code cleanup
NVnavkumar Jul 14, 2022
e2d0d8d
Fix syntax errors in RegularExpressionSuite that prevent it from load…
NVnavkumar Jul 14, 2022
7f8f7aa
Merge branch 'branch-22.08' of github.com:NVIDIA/spark-rapids into re…
NVnavkumar Jul 14, 2022
652cf94
register custom regexp mark
NVnavkumar Jul 14, 2022
158a70e
updates to build script and test script
NVnavkumar Jul 15, 2022
16fb328
revert the nightly build script updates
NVnavkumar Jul 15, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Add reference to issue regarding \b and \B unicode issue
Signed-off-by: Navin Kumar <[email protected]>
NVnavkumar committed Jun 21, 2022

Unverified

This user has not yet uploaded their public signing key.
commit 598634b9154aec548bfacb90e000f16690fe1981
Original file line number Diff line number Diff line change
@@ -1005,6 +1005,7 @@ class FuzzRegExp(suggestedChars: String, skipKnownIssues: Boolean = true) {
var ch = '\u0000'
do {
ch = chars(rr.nextInt(chars.length))
// see https://github.com/NVIDIA/spark-rapids/issues/5882 for \B and \b issue
} while (skipKnownIssues && "bB".contains(ch))
RegexEscaped(ch)
}
@@ -1034,6 +1035,7 @@ class FuzzRegExp(suggestedChars: String, skipKnownIssues: Boolean = true) {
baseGenerators
} else {
baseGenerators ++ Seq[() => RegexAST](
// see https://github.com/NVIDIA/spark-rapids/issues/5882 for \B and \b issue
() => RegexEscaped('b'),
() => RegexEscaped('B'))
}