[FEA] COLLECT window aggregation should support null_policy::EXCLUDE #7258
Labels
feature request
New feature or request
libcudf
Affects libcudf (C++/CUDA) code.
Spark
Functionality that helps Spark RAPIDS
#7189 implements
COLLECT
aggregations to be done from window functions. The semantics of how null input rows are handled are consistent with CUDF semantics.E.g.
Note that the null element (
∅
) is replicated in the first 3 rows of the output.SparkSQL (and Hive, and other big data SQL systems) have different semantics, in that all null elements are purged. The output for the same operation should yield the following:
CUDF should allow the
COLLECT
aggregation to be constructed with an optionalnull_policy
argument (with defaultINCLUDE
). TheCOLLECT
window function should check the policy, and filter out null list-elements a posteriori.The text was updated successfully, but these errors were encountered: