-
Notifications
You must be signed in to change notification settings - Fork 933
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] extract item from a list by index #5742
Comments
cuDF Python could use this as well, but I think the expectation from our side would be to throw if the element is out of bounds, but we'll take whatever makes sense from the C++ perspective 😄 |
There is a precedent of a The closest analogy I can think of is the cudf.str.get() function which calls into cudf::strings::slice_strings(). The IMO, I think returning null for out-of-bounds within a list makes sense here. |
Just a clarification. From your last example in the description:
negative index values should be considered out-of-bounds? |
cc @shwina for this discussion |
According to #5505 that We'd love to have support for negative index values to avoid additional pre-processing though. So maybe negative index behaviour should be an input parameter? |
Yes negative values are considered out of bounds for Spark. We can work around a lot of issues with bounds checking/etc so long as we have a list length API too. So the bounds checking is some what optional. |
Is your feature request related to a problem? Please describe.
I would like to be able to pull out the nth element from a list column and return a new column.
Describe the solution you'd like
I would love an API that lets me take a list column, and pull out a single entry from each list in the column.
Something like the following
Describe alternatives you've considered
There really are not any except writing it ourselves.
Additional context
This is for Spark. At a minimum we need an API that would look something like.
But spark supports the full gambit of options
For Spark a null is returned if the index is out of bounds for the list, or if the list itself is null, or if the value in the list is null. We really want this to work for a list of strings.
The text was updated successfully, but these errors were encountered: