-
Notifications
You must be signed in to change notification settings - Fork 916
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] index_in_list()
to return the index of a search-key in a list row
#9164
Comments
One might also use this to look up a
|
This issue has been labeled |
Fixes #9164. ### Prelude `lists::contains()` (introduced in #7039) returns a `BOOL8` column, indicating whether the specified search_key(s) exist at all in each corresponding list row of an input LIST column. It does not return the actual position. ### `index_of()` This commit introduces `lists::index_of()`, to return the INT32 positions of the specified search_key(s) in a LIST column. The search keys may be searched for using either `FIND_FIRST` (which finds the position of the first occurrence), or `FIND_LAST` (which finds the last occurrence). Both column_view and scalar search keys are supported. As with `lists::contains()`, nested types are not supported as search keys in `lists::index_of()`. If the search_key cannot be found, that output row is set to `-1`. Additionally, the row `output[i]` is set to null if: 1. The `search_key`(scalar) or `search_keys[i]`(column_view) is null. 2. The list row `lists[i]` is null In all other cases, `output[i]` should contain a non-negative value. ### Semantic changes for `lists::contains()` This commit also modifies the semantics of `lists::contains()`: it will now return nulls only for the following cases: 1. The `search_key`(scalar) or `search_keys[i]`(column_view) is null. 2. The list row `lists[i]` is null In all other cases, a non-null bool is returned. Specifically `lists::contains()` no longer conforms to SQL semantics of returning `NULL` for list rows that don't contain the search key, while simultaneously containing nulls. In this case, `false` is returned. ### `lists::contains_null_elements()` A new function has been introduced to check if each list row contains null elements. The semantics are similar to `lists::contains()`, in that the column returned is BOOL8 typed: 1. If even 1 element in a list row is null, the returned row is `true`. 2. If no element is null, the returned row is `false`. 3. If the list row is null, the returned row is `null`. 4. If the list row is empty, the returned row is `false`. The current implementation is an inefficient placeholder, to be replaced once (#9588) is available. It is included here to reconstruct the SQL semantics dropped from `lists::contains()`. Authors: - MithunR (https://github.com/mythrocks) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) - Jason Lowe (https://github.com/jlowe) - Mark Harris (https://github.com/harrism) - Conor Hoekstra (https://github.com/codereport) URL: #9510
#7039 added a
lists::contains()
function to test whether each list row contains a specified search-key. This works with scalar search-keys and with columns, and returns a column ofBOOL8
.It would be good to have a
lists::index_in_list()
that functions similarly, to return the index of the specified search-key in each list row. E.g.Such a function would be useful in implementing a GPU accelerated
array_position()
in SparkSQL.The text was updated successfully, but these errors were encountered: