[FEA] Generate element labels from offsets #10905
Labels
feature request
New feature or request
libcudf
Affects libcudf (C++/CUDA) code.
non-breaking
Non-breaking change
Spark
Functionality that helps Spark RAPIDS
In some cases, for a list column, we want to generate labels for each element in the child column.
For example, given a list column
[ [1, 2, 3], [4, 5], [6, 7, 8] ]
, we want to generate a label column like[0, 0, 0, 1, 1, 2, 2, 2]
.Having such label column, we can combine the child column (i.e,
[1, 2, 3, 4, 5, 6, 7, 8]
) and the label column for further processing. Use case of such label column already exists indrop_list_duplicates
(link).The next use case would be for set-like operations (#10409), when we want to process all elements in the child column in parallel (i.e., one element per thread), instead of one list per thread.
The text was updated successfully, but these errors were encountered: