-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid regexp_cost with stringSplit on the GPU using transpilation #4854
Avoid regexp_cost with stringSplit on the GPU using transpilation #4854
Conversation
…ranspiling escaped meta characters and plain characters into a simpler string Signed-off-by: Navin Kumar <[email protected]>
tests/src/test/scala/com/nvidia/spark/rapids/RegularExpressionTranspilerSuite.scala
Outdated
Show resolved
Hide resolved
tests/src/test/scala/com/nvidia/spark/rapids/RegularExpressionTranspilerSuite.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/stringFunctions.scala
Outdated
Show resolved
Hide resolved
Signed-off-by: Navin Kumar <[email protected]>
…leToSplittableString checks are similar Signed-off-by: Navin Kumar <[email protected]>
@@ -277,7 +277,7 @@ class RegexParser(pattern: String) { | |||
// word boundaries | |||
consumeExpected(ch) | |||
RegexEscaped(ch) | |||
case '[' | '\\' | '^' | '$' | '.' | '⎮' | '?' | '*' | '+' | '(' | ')' | '{' | '}' => | |||
case '[' | '\\' | '^' | '$' | '.' | '|' | '?' | '*' | '+' | '(' | ')' | '{' | '}' => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pipe char changed? What was it before and was that a bug?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was some accidental unicode version (https://www.compart.com/en/unicode/U+23AE)
build |
It looks like there was a transient build failure |
build |
1 similar comment
build |
The vulnerability scan is failing with |
build |
2 similar comments
build |
build |
Fixes #4685.
This avoids the RegExp cost on the GPU by transpiling simple patterns that only contain a combination of non-special characters and escaped meta characters to simplified strings that can be passed to string split with RegExp disabled.
Signed-off-by: Navin Kumar [email protected]