-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix replace error when regex has only zero match quantifiers #10760
Fix replace error when regex has only zero match quantifiers #10760
Conversation
Codecov Report
@@ Coverage Diff @@
## branch-22.06 #10760 +/- ##
================================================
+ Coverage 86.40% 86.43% +0.02%
================================================
Files 143 143
Lines 22444 22444
================================================
+ Hits 19393 19399 +6
+ Misses 3051 3045 -6
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fix looks good. Can we add a test that covers the case that previously failed in #10753?
After reading #10753, I see that the behavior may not be well defined across all regex implementations. This page documents some of those differences: https://www.regular-expressions.info/zerolength.html However, it seems there is some consensus on how to proceed after a zero-length match:
I would suggest that we write a test matching this behavior (which is the same as the current behavior, as far as I can tell), and claim that is our "defined" behavior for now. From my understanding, it seems that we have three pieces of evidence that this behavior is reasonable: it matches the user's expectation in #10753, it matches the Python |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test looks great. Thanks @davidwendt! For the record, I checked and the behavior of this test matches that of Python's re.sub
. Might be worth adding a Python test for zero-match quantifiers if we already have some tests there.
@gpucibot merge |
Closes #10753
Fixes
cudf::strings::replace_re
logic that was reading past the end of a string when given a regex that contained net zero match quantifier pattern (e.g. 'D*' or 'D?s?' both can match to nothing).