Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement lists::index_of() to find positions in list rows #9510

Merged
merged 20 commits into from
Dec 20, 2021
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
98 changes: 98 additions & 0 deletions cpp/include/cudf/lists/contains.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,104 @@ std::unique_ptr<column> contains(
cudf::column_view const& search_keys,
rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource());

/**
* @brief Create a column of bool values indicating whether each row in the `lists` column
mythrocks marked this conversation as resolved.
Show resolved Hide resolved
* contains at least one null element.
*
* The output column has as many elements as the input `lists` column.
* Output `column[i]` is set to null the list row `lists[i]` is null.
* Otherwise, `column[i]` is set to a non-null boolean value, depending on whether that list
* contains a null element.
* (Empty list rows are considered *NOT* to contain a null element.)
*
* @param lists Lists column whose `n` rows are to be searched
* @param mr Device memory resource used to allocate the returned column's device memory.
* @return std::unique_ptr<column> BOOL8 column of `n` rows with the result of the lookup
*/
std::unique_ptr<column> contains_null_elements(
cudf::lists_column_view const& lists,
rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource());

/**
* @brief Option to choose whether `index_of()` returns the first or last match
* of a search key in a list row
*/
enum class duplicate_find_option : int32_t {
FIND_FIRST = 0, ///< Finds first instance of a search key in a list row.
FIND_LAST ///< Finds last instance of a search key in a list row.
};
harrism marked this conversation as resolved.
Show resolved Hide resolved

/**
* @brief Create a column of `size_type` values indicating the position of a search key
* within each list row in the `lists` column
*
* The output column has as many elements as there are rows in the input `lists` column.
* Output `column[i]` contains a 0-based index indicating the position of the search key
* in each list, counting from the beginning of the list.
* Note:
* 1. If the `search_key` is null, all output rows are set to null.
* 2. If the row `lists[i]` is null, `output[i]` is also null.
* 3. If the row `lists[i]` does not contain the `search_key`, `output[i]` is set to `-1`.
* 4. In all other cases, `output[i]` is set to a non-negative `size_type` index.
*
* If the `find_option` is set to `FIND_FIRST`, the position of the first match for
* `search_key` is returned.
* If `find_option == FIND_LAST`, the position of the last match in the list row is
* returned.
*
* @param lists Lists column whose `n` rows are to be searched
* @param search_key The scalar key to be looked up in each list row
* @param find_option Whether to return the position of the first match (`FIND_FIRST`) or
* last (`FIND_LAST`)
* @param mr Device memory resource used to allocate the returned column's device memory.
* @return std::unique_ptr<column> INT32 column of `n` rows with the location of the `search_key`
*
* @throw cudf::logic_error If `search_key` type does not match the element type in `lists`
* @throw cudf::logic_error If `search_key` is of a nested type, or `lists` contains nested
* elements (LIST, STRUCT)
*/
std::unique_ptr<column> index_of(
cudf::lists_column_view const& lists,
cudf::scalar const& search_key,
duplicate_find_option find_option = duplicate_find_option::FIND_FIRST,
rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource());

/**
* @brief Create a column of `size_type` values indicating the position of a search key
* row within the corresponding list row in the `lists` column
*
* The output column has as many elements as there are rows in the input `lists` column.
* Output `column[i]` contains a 0-based index indicating the position of each search key
* row in its corresponding list row, counting from the beginning of the list.
* Note:
* 1. If `search_keys[i]` is null, `output[i]` is also null.
* 2. If the row `lists[i]` is null, `output[i]` is also null.
* 3. If the row `lists[i]` does not contain `search_key[i]`, `output[i]` is set to `-1`.
* 4. In all other cases, `output[i]` is set to a non-negative `size_type` index.
*
* If the `find_option` is set to `FIND_FIRST`, the position of the first match for
* `search_key` is returned.
* If `find_option == FIND_LAST`, the position of the last match in the list row is
* returned.
*
* @param lists Lists column whose `n` rows are to be searched
* @param search_keys A column of search keys to be looked up in each corresponding row of
* `lists`
* @param find_option Whether to return the position of the first match (`FIND_FIRST`) or
* last (`FIND_LAST`)
* @param mr Device memory resource used to allocate the returned column's device memory.
* @return std::unique_ptr<column> INT32 column of `n` rows with the location of the `search_key`
*
* @throw cudf::logic_error If `search_keys` does not match `lists` in its number of rows
* @throw cudf::logic_error If `search_keys` type does not match the element type in `lists`
* @throw cudf::logic_error If `lists` or `search_keys` contains nested elements (LIST, STRUCT)
*/
std::unique_ptr<column> index_of(
cudf::lists_column_view const& lists,
cudf::column_view const& search_keys,
duplicate_find_option find_option = duplicate_find_option::FIND_FIRST,
rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource());

/** @} */ // end of group
} // namespace lists
} // namespace cudf
Loading