Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle ?, *, {0,} and {0,n} based repetitions in regexp_replace on the GPU #5450

Merged
merged 12 commits into from
Jun 6, 2022

Conversation

NVnavkumar
Copy link
Collaborator

Fixes #4468.

This enables "zero-or-more"-based repetitions in regexp_replace operations that would run the GPU. This was possible due to rapidsai/cudf#10760 being merged and fixing a crash scenario that originally was occurring with these types of repetitions and input that fully matches the regexp.

@NVnavkumar NVnavkumar requested a review from andygrove May 10, 2022 18:01
@NVnavkumar NVnavkumar self-assigned this May 10, 2022
@NVnavkumar
Copy link
Collaborator Author

build

@NVnavkumar NVnavkumar marked this pull request as draft May 10, 2022 19:04
@sameerz sameerz added the feature request New feature or request label May 10, 2022
@sameerz sameerz added this to the May 2 - May 20 milestone May 10, 2022
@NVnavkumar NVnavkumar marked this pull request as ready for review May 10, 2022 23:36
@NVnavkumar
Copy link
Collaborator Author

build

@andygrove
Copy link
Contributor

I am working on debugging the test that fails in CI but not locally

@andygrove
Copy link
Contributor

build

@andygrove
Copy link
Contributor

build

andygrove
andygrove previously approved these changes May 17, 2022
Copy link
Contributor

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@andygrove
Copy link
Contributor

build

@andygrove
Copy link
Contributor

There are new failures since upmerging.

- compare CPU and GPU: regexp replace fuzz test with limited chars *** FAILED ***
  javaPattern="$S?a?", cudfPattern="([\n\r\u0085\u2028\u2029]|\r\n)?$S?a?", input='""', cpu="_RE\\PLACE_", gpu="" (RegularExpressionTranspilerSuite.scala:712)
- AST fuzz test - regexp_replace *** FAILED ***
  javaPattern="\u0085?$", cudfPattern="\u0085?([\n\r\u0085\u2028\u2029]|\r\n)?$", input='""', cpu="_RE\\PLACE_", gpu="" (RegularExpressionTranspilerSuite.scala:712)

@andygrove
Copy link
Contributor

@NVnavkumar could you retarget this for 22.08

@NVnavkumar NVnavkumar changed the base branch from branch-22.06 to branch-22.08 June 2, 2022 16:55
@NVnavkumar NVnavkumar dismissed andygrove’s stale review June 2, 2022 16:55

The base branch was changed.

andygrove
andygrove previously approved these changes Jun 2, 2022
Copy link
Contributor

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Signed-off-by: Navin Kumar <[email protected]>
@sameerz
Copy link
Collaborator

sameerz commented Jun 2, 2022

build

@NVnavkumar
Copy link
Collaborator Author

build

1 similar comment
@NVnavkumar
Copy link
Collaborator Author

build

@sameerz
Copy link
Collaborator

sameerz commented Jun 5, 2022

build

@NVnavkumar
Copy link
Collaborator Author

build

@NVnavkumar NVnavkumar merged commit 8f20914 into NVIDIA:branch-22.08 Jun 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Support repetition quantifiers ? and * with regexp_replace
4 participants