[FEA] Allow initial value for cudf::reduce and cudf::segmented_reduce. #11002
Labels
feature request
New feature or request
good first issue
Good for newcomers
libcudf
Affects libcudf (C++/CUDA) code.
Problem statement
The algorithms
cudf::reduce
andcudf::segmented_reduce
do not accept an initial value.Similar algorithms like
std::reduce
,thrust::reduce
,cub::DeviceReduce
, andcub::DeviceSegmentedReduce
all support providing an initial value.Providing an initial value for reductions is a common need. This is related to #10455, and is a more general statement of the problem described in that issue.
Describe the solution you'd like
I propose adding method overloads like the following:
Design and expected results
The design proposed below was discussed with @gerashegalov @nvdbaranec and @SrikarVanavasam.
For each piece of input data below, I show the current behavior of a reduction and the two proposed behaviors for a null and a non-null initial value. The input data could be a column of values or a single segment of a segmented reduction -- the results are identical.
Note that although 0 is the identity for a sum reduction, the current behavior differs from the results of explicitly providing an initial value that is the identity in the case of empty input data.
Proposed Implementation 1
If the input scalar is valid, provide the value to
cub::DeviceSegmentedReduce
instead of the binary operator identity. If the input scalar is null, supply the binary operator identity as in the current code.For normal reductions (non-segmented), the validity logic is straightforward to update: the output is valid if the initial value and reduction result are valid. There may be an early exit case where the output scalar must be null if the initial value is null, allowing the reduction kernel to be skipped (unless the binary operation is special in its null handling, like
NULL_MAX
?).For segmented reductions, alter the validity predicate in
segmented_null_mask_reduction
or perform postprocessing of that bitmask to incorporate the input scalar's nullity and match the expected results in the proposal above.Proposed Implementation 2
Implement the initial value behavior by performing the current segmented reduction logic, and then performing the corresponding binary operation between the initial value scalar and the result column. Then, for any empty inputs (empty columns or empty segments), the initial value should replace the result value instead. (Otherwise an initial value of 0 and an empty input [] would result in 0 + null = null.)
The text was updated successfully, but these errors were encountered: