Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Ability to regex replace a capture group followed by a digit #8816

Closed
jlowe opened this issue Jul 21, 2021 · 1 comment · Fixed by #8841
Closed

[FEA] Ability to regex replace a capture group followed by a digit #8816

jlowe opened this issue Jul 21, 2021 · 1 comment · Fixed by #8841
Assignees
Labels
feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. Spark Functionality that helps Spark RAPIDS strings strings issues (C++ and Python)

Comments

@jlowe
Copy link
Contributor

jlowe commented Jul 21, 2021

Is your feature request related to a problem? Please describe.
I would like to capture a group and then specify in the replace string that the capture group is emitted followed by a zero. The problem is that \10 looks like I'm asking for capture group 10 instead of capture group 1 followed by a zero.

Describe the solution you'd like
If there was an alternate syntax for specifying a capture group, like ${1} instead of \1 then ${1}0 wouldn't be ambiguous what is being requested.

Describe alternatives you've considered
One potential issue with the proposed solution is that it adds semantic interpretation of the replace string that didn't exist before. That could break existing use-cases that happen to use a ${1} syntax or something like it. Another approach could be to allow hexadecimal escapes or some other notation, e.g.: \1\x30 where hex 30 is ASCII 0, but that too will add a new interpretation of potentially pre-existing replace strings.

@jlowe jlowe added feature request New feature or request Needs Triage Need team to review and classify libcudf Affects libcudf (C++/CUDA) code. Spark Functionality that helps Spark RAPIDS strings strings issues (C++ and Python) labels Jul 21, 2021
@davidwendt
Copy link
Contributor

Looking at the reference documentation here I'm not seeing \x## syntax mentioned for the replacement string. And we certainly do not support all the features listed there. Regardless, supporting escaped characters in the replacement string in the current libcudf logic would be difficult. I think we could more easily support the ${#} pattern but like Jason mentioned, this could fail if there are existing replacement strings that happen to have strings with this pattern and are relying on the \# syntax. I think we could support both by first checking if the \# pattern exists and if not, switch to look for ${#} and thereby disallow a mixture of the two syntaxes. This would still have some holes but could minimize any breakages.

@davidwendt davidwendt self-assigned this Jul 21, 2021
@beckernick beckernick removed the Needs Triage Need team to review and classify label Jul 23, 2021
@rapids-bot rapids-bot bot closed this as completed in #8841 Aug 6, 2021
rapids-bot bot pushed a commit that referenced this issue Aug 6, 2021
… index values (#8841)

Closes #8816 

The current `\d` syntax for the replacement template parameter will fail if a number immediately follows the index pattern as described in #8816. This PR adds support for the `${d}` pattern but only if the `\d` pattern is not found in the replacement string. This should minimize breaking any current templates already being used with this API.

Authors:
  - David Wendt (https://github.com/davidwendt)

Approvers:
  - Vukasin Milovanovic (https://github.com/vuule)
  - Christopher Harris (https://github.com/cwharris)

URL: #8841
shwina pushed a commit to shwina/cudf that referenced this issue Aug 9, 2021
… index values (rapidsai#8841)

Closes rapidsai#8816 

The current `\d` syntax for the replacement template parameter will fail if a number immediately follows the index pattern as described in rapidsai#8816. This PR adds support for the `${d}` pattern but only if the `\d` pattern is not found in the replacement string. This should minimize breaking any current templates already being used with this API.

Authors:
  - David Wendt (https://github.com/davidwendt)

Approvers:
  - Vukasin Milovanovic (https://github.com/vuule)
  - Christopher Harris (https://github.com/cwharris)

URL: rapidsai#8841
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. Spark Functionality that helps Spark RAPIDS strings strings issues (C++ and Python)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants