[FEA] substring_index #5158
Labels
feature request
New feature or request
libcudf
Affects libcudf (C++/CUDA) code.
Spark
Functionality that helps Spark RAPIDS
strings
strings issues (C++ and Python)
Is your feature request related to a problem? Please describe.
I would love to have an API that acts like the substring_index SQL function.
The following is from the official spark docs
Describe the solution you'd like
I would like a function that takes 3 parameters, the original string, a sub-string to look for and a count for how many matches to make. Ideally we provide versions that can take Scalars as well as columns for each of the parameters, but I am willing to take one that just uses columns as I can create a column from scalars if I need to.
Describe alternatives you've considered
I tried to do this with extract, and got most of the way there for some very special cases, but it is no where near complete and is likely going to be a lot slower than a special built solution.
Additional context
substring_index is a standard SQl operator so I suspect that others will be interested i it too.
The text was updated successfully, but these errors were encountered: