-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Regex: $ does not match \n
at end of string
#9620
Comments
The default behavior of
The flags are defined here cudf/cpp/include/cudf/strings/regex/flags.hpp Lines 35 to 39 in eda31b6
|
I think I could have done a better job explaining this issue. What I am looking for is to have the default cuDF behavior (MULTILINE disabled) match Python's default behavior (also MULTILINE disabled), as described here. The key point is around how By default, Does that make sense? |
Ok, got it, thanks. This is some wacky, yet documented edge case.
|
…#9715) Closes #9620 Fixes an edge case described in https://docs.python.org/3/library/re.html#re.MULTILINE where the '$' EOL regex pattern character (without `MULTILINE` set) should match at the very end of a string and also just before the end of the string if the end of that string contains a new-line. Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Christopher Harris (https://github.com/cwharris) - Vukasin Milovanovic (https://github.com/vuule) - Sheilah Kirui (https://github.com/skirui-source) URL: #9715
Closes #9764 This reverts the change made in #9620 for the reasons given in #9764 (comment) Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) - Mike Wilson (https://github.com/hyperbolic2346) - Andy Grove (https://github.com/andygrove) URL: #9774
Describe the bug
There is an edge case with multiline / EOL matching that seems to be incorrect in cuDF.
Given the pattern
2$
I would expect it to match the inputs2
and2\n
as seen in Python:However, cuDF does not match in the
2\n
case.Steps/Code to reproduce bug
I don't know how to use cuDF from the Python repl with the latest code so I have not actually tested this, but this should be the repro case.
Expected behavior
$
should match EOL even if the input ends with a line terminator.Environment overview (please complete the following information)
N/A
Environment details
N/A
Additional context
None
The text was updated successfully, but these errors were encountered: