-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] contains_re hangs with some regexp patterns #10006
Comments
This pattern resolves to the following instructions in the libcudf regex code:
This causes in an infinite loop at instruction 4 when the character '3' is not found in the target string. So it appears the implementation unfortunately does not support this pattern. Since this is not an I'm not sure if the code can be fixed right now. At the very least I hope we could detect the infinite-loop well enough to throw an error. |
Closes #10006 Fixes a use case where the regex pattern creates a set of instructions that can cause the regex evaluation process to go into an infinite loop. For example, the pattern `(x?)+` creates the following instructions: ``` Instructions: 0: CHAR c='x', next=2 1: OR right=0, left=2, next=2 2: RBRA id=1, next=4 3: LBRA id=1, next=1 4: OR right=3, left=5, next=5 5: END startinst_id=3 startinst_ids: [ 3 -1] ``` This causes in an infinite loop at instruction 4 where the path may go like: 4->3->1->2->4 ... forever. Supporting this pattern does not look possible. The `+` quantifier is applied to capture group symbol `)` inside of which `x?` means 0 or more repeating the character `x`. This means it could match `x` or nothing and so applying the `+` to nothing would be invalid. That said, the pattern `x?+` currently already throws an error because of the invalid usage of `+` quantifier. Therefore, the fix here adds a checking step after the instruction set is created to check for a possible infinite-loop case. If one is detected, an exception is thrown indicating the pattern is not supported. Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Devavret Makkar (https://github.com/devavret) - Vyas Ramasubramani (https://github.com/vyasr) URL: #10095
Describe the bug
The
contains_re
function hangs with some regular expression patterns.Steps/Code to reproduce bug
I ran the following with rapids-21.10 conda env and it hangs. I am seeing the same behavior with 22.02 via the Spark accelerator.
Expected behavior
This should not hang.
Environment overview (please complete the following information)
conda rapids-21.10
Environment details
Click here to see environment details
Additional context
None
The text was updated successfully, but these errors were encountered: