-
Notifications
You must be signed in to change notification settings - Fork 240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] regexp_replace hangs with specific inputs and patterns #8323
Labels
bug
Something isn't working
Comments
andygrove
added
bug
Something isn't working
? - Needs Triage
Need team to review and classify
labels
May 18, 2023
Here is a Spark repro:
Note that we are transpiling the Java pattern I have not been able to reproduce this in cuDF using the latest nightly. >>> import cudf
>>> cudf.__version__
'23.06.00'
>>> s = cudf.Series(["one\ntwo", "three\n\n"])
>>> s.str.replace('[^\n\r\u0085\u2028\u2029]*(\r|\u0085|\u2028|\u2029|\r\n)?$', 'scala', 1, regex=True)
0 one\nscala
1 three\nscala\n
dtype: object |
Here is a Java repro, using the published cuDF 23.04 jar import ai.rapids.cudf.*;
public class Main {
public static void main(String args[]) {
try (ColumnVector v = ColumnVector.fromStrings("one\ntwo", "three\n\n")) {
String pattern = "[^\n\r\u0085\u2028\u2029]*(\r|\u0085|\u2028|\u2029|\r\n)?$";
String repl = "scala${1}";
RegexProgram prog = new RegexProgram(pattern);
v.stringReplaceWithBackrefs(prog, repl);
}
}
} |
It is also reproducible with a simpler pattern that does not include the unicode characters: try (ColumnVector v = ColumnVector.fromStrings("one\ntwo", "three\n\n")) {
String pattern = "[^\n\r]*(\r|\r\n)?$";
String repl = "scala${1}";
RegexProgram prog = new RegexProgram(pattern);
v.stringReplaceWithBackrefs(prog, repl);
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Describe the bug
regexp_replace hangs on the GPU with specific inputs and patterns.
Steps/Code to reproduce bug
Add this test to
regexp_test.py
Expected behavior
Should either fall back to CPU or complete successfully on GPU
Environment details (please complete the following information)
Has been seen with CUDA 11.7 and 12 with Spark 3.3.x
Additional context
The text was updated successfully, but these errors were encountered: