Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid regexp_cost with stringSplit on the GPU using transpilation #4854

Merged
merged 3 commits into from
Mar 4, 2022

Conversation

NVnavkumar
Copy link
Collaborator

Fixes #4685.

This avoids the RegExp cost on the GPU by transpiling simple patterns that only contain a combination of non-special characters and escaped meta characters to simplified strings that can be passed to string split with RegExp disabled.

Signed-off-by: Navin Kumar [email protected]

@NVnavkumar NVnavkumar requested a review from andygrove February 23, 2022 23:29
…ranspiling escaped meta characters and plain characters into a simpler string

Signed-off-by: Navin Kumar <[email protected]>
@@ -277,7 +277,7 @@ class RegexParser(pattern: String) {
// word boundaries
consumeExpected(ch)
RegexEscaped(ch)
case '[' | '\\' | '^' | '$' | '.' | '' | '?' | '*' | '+' | '(' | ')' | '{' | '}' =>
case '[' | '\\' | '^' | '$' | '.' | '|' | '?' | '*' | '+' | '(' | ')' | '{' | '}' =>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pipe char changed? What was it before and was that a bug?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was some accidental unicode version (https://www.compart.com/en/unicode/U+23AE)

@NVnavkumar NVnavkumar marked this pull request as ready for review February 24, 2022 16:24
@NVnavkumar
Copy link
Collaborator Author

build

@andygrove
Copy link
Contributor

It looks like there was a transient build failure

@andygrove
Copy link
Contributor

build

1 similar comment
@NVnavkumar
Copy link
Collaborator Author

build

@andygrove
Copy link
Contributor

The vulnerability scan is failing with Could not communicate with Black Duck: The connection was not successful for an unknown reason. If an api token is being used, it could be incorrect..

@sameerz sameerz added this to the Feb 28 - Mar 11 milestone Feb 26, 2022
@sameerz sameerz added the performance A performance related task/issue label Feb 26, 2022
@NVnavkumar NVnavkumar self-assigned this Mar 1, 2022
@sameerz
Copy link
Collaborator

sameerz commented Mar 3, 2022

build

2 similar comments
@sameerz
Copy link
Collaborator

sameerz commented Mar 4, 2022

build

@sameerz
Copy link
Collaborator

sameerz commented Mar 4, 2022

build

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance A performance related task/issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Avoid regexp cost in string_split for escaped characters
4 participants