-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] libcudf string split returning list of strings #5667
Comments
Pandas similarly supports this if |
Take a look at cudf::strings::contiguous_split_record which is intended to support the |
Perhaps cuDF Python may want to also add list support to the output of a few other string methods, too ( |
@galipremsagar I remember we did something in porting the nvstrings |
We have a cython plumbing only for So we did the merging of API for |
The libcudf part of this was completed in PR #5687 |
This was handled in Python / Cython as well. Closing. |
Is your feature request related to a problem? Please describe.
Spark supports a string split function that takes a string and delimiter and returns a list of strings. We would like to support this operation in the RAPIDS Accelerator for Apache Spark.
Describe the solution you'd like
A form of libcudf's
cudf::strings::split
that instead of returning separate columns for the fields it returns a single, list-of-strings column where each row contains a list of all the fields produced by the split. Ideally the split function would support a regular expression to identify the delimiter, but we can still support many common queries with a function that only allows an exact-match, scalar delimiter string.Describe alternatives you've considered
Theoretically we could work with the existing
cudf::strings::split
but trying to map multiple columns to Spark'sArrayType
is messy in practice and would be specific to this operation. It's not inline with the straightforward mapping ofArrayType
to the new list type currently being added to libcudf.The text was updated successfully, but these errors were encountered: