-
Notifications
You must be signed in to change notification settings - Fork 914
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] Enable .str.find_multiple
public API
#10126
Comments
Additional context (@galipremsagar beat me to filing an issue! 😉): The gpu-bdb query 18 code uses this private API, Describe the solution you'd like |
Technically you can build the lists column in the Python layer by just creating an offsets column and using it along with column returned by the
Build a ListColumn by using |
.str.find_muliple
public API.str.find_multiple
public API
Reference #10126 This changes the current `cudf::strings::find_multiple` API to return a lists column instead of a flattened matrix as a column of integers. Each lists column size is equal to the size of the input strings and each row's size is equal to input targets column size. Making this a breaking change since it changes the result of a public libcudf API. Authors: - David Wendt (https://github.com/davidwendt) - Bradley Dice (https://github.com/bdice) Approvers: - Bradley Dice (https://github.com/bdice) - Robert Maynard (https://github.com/robertmaynard) - Nghia Truong (https://github.com/ttnghia) URL: #10134
This issue has been labeled |
This API is also a candidate for a separate NLP-focused submodule (which has been discussed among cudf developers a few times but I am not aware of an issue tracking this idea). The pandas API does not have an equivalent of |
This issue has been labeled |
This issue has been labeled |
Closed by #10134 |
@GregoryKimball Re-opening this one because it’s only partially complete. The proposed Python API has not yet been added, only the C++ changes have been made. See: #10134 (review) |
Resolves: #10126 This PR adds `.str.find_multiple` API. Authors: - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - Matthew Roeschke (https://github.com/mroeschke) - Bradley Dice (https://github.com/bdice) URL: #11928
Is your feature request related to a problem? Please describe.
We currently have
cudf._lib.strings.find_multiple
which is an internal API. We would like to expose it via a public API similar to what we wanted to do in: #4575, this was previously dropped due to lack of ListColumn support: #4569 (comment), which shouldn't be a blocker now.The text was updated successfully, but these errors were encountered: