Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

For the overload of replace in libcudf where input/target/repl are columns, there isn't a maxrepl arg. #15855

Closed
lithomas1 opened this issue May 24, 2024 · 6 comments · Fixed by #15898
Assignees
Labels
feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. strings strings issues (C++ and Python)

Comments

@lithomas1
Copy link
Contributor

lithomas1 commented May 24, 2024

For the overload of replace in libcudf where input/target/repl are columns, there isn't a maxrepl arg.

We should probably support this in libcudf replace (eventually), otherwise we'll have some weirdness in pylibcudf where we'll have to raise for maxrepl despite accepting it as an argument.

Originally posted by @lithomas1 in #15839 (comment)

@lithomas1
Copy link
Contributor Author

To provide more context, in Cython when you have fused types, the number of arguments stays constant, so you can't have the situation like in C++, where you have an overload with one less argument or something like that.

@lithomas1 lithomas1 added feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. labels May 24, 2024
@davidwendt
Copy link
Contributor

davidwendt commented May 24, 2024

Supporting this in libcudf never really made sense since it would be confusing to know where the max was applied.
Is it per target or per repl or both? Should there be a column of max values?
Also, I don't know when it would be used and no one has asked for the feature so I would likely mark this a "won't fix".
We could certainly change the name of the API if that helps.

@lithomas1
Copy link
Contributor Author

That makes sense, I think I misunderstood what replace did.

Renaming this might help.
Personally, I always confuse this method as doing a vectorized replace, where there's a different target and replacement value for every string value in the column.

Maybe that's just me though.

@davidwendt
Copy link
Contributor

The documentation provides an example on how it works
https://docs.rapids.ai/api/libcudf/stable/group__strings__replace#ga59aaccc6c1adc602454becd5c8755d84
Do you have a suggestion for the name? replace_multiple ?

@davidwendt davidwendt added strings strings issues (C++ and Python) labels May 24, 2024
@lithomas1
Copy link
Contributor Author

Yep that sounds good to me.

@davidwendt davidwendt self-assigned this May 24, 2024
rapids-bot bot pushed a commit that referenced this issue Jun 3, 2024
Renames the multi-target overload of `cudf::strings::replace()` to `cudf::strings::replace_multiple()`.
This helps with some Cython issues involving fused types and overloaded functions with the same number of arguments.
Reference: #15855 (comment)

This change deprecates the old name to be removed in a future release.

Also added some additional error unit tests.

Closes #15855

Authors:
  - David Wendt (https://github.com/davidwendt)

Approvers:
  - Shruti Shivakumar (https://github.com/shrshi)
  - Nghia Truong (https://github.com/ttnghia)
  - Vyas Ramasubramani (https://github.com/vyasr)

URL: #15898
@lithomas1
Copy link
Contributor Author

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. strings strings issues (C++ and Python)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants