Skip to content

Commit

Permalink
Sanitize percentile_approx() output for empty input (rapidsai#11498)
Browse files Browse the repository at this point in the history
Closes rapidsai#10856 

If all of the tdigest inputs to percentile_approx are empty, it will return a column containing null rows, as expected, but they will have unsanitary offsets. This PR checks if all the inputs are empty and returns an empty column as expected.

Authors:
  - Srikar Vanavasam (https://github.com/SrikarVanavasam)

Approvers:
  - Nghia Truong (https://github.com/ttnghia)
  - MithunR (https://github.com/mythrocks)
  - https://github.com/nvdbaranec

URL: rapidsai#11498
  • Loading branch information
SrikarVanavasam authored Aug 11, 2022
1 parent e66ed15 commit a67b718
Show file tree
Hide file tree
Showing 2 changed files with 8 additions and 4 deletions.
9 changes: 7 additions & 2 deletions cpp/src/quantiles/tdigest/tdigest.cu
Original file line number Diff line number Diff line change
Expand Up @@ -352,13 +352,18 @@ std::unique_ptr<column> percentile_approx(tdigest_column_view const& input,
// output is a list column with each row containing percentiles.size() percentile values
auto offsets = cudf::make_fixed_width_column(
data_type{type_id::INT32}, input.size() + 1, mask_state::UNALLOCATED, stream, mr);
auto row_size_iter = thrust::make_constant_iterator(percentiles.size());
auto const all_empty_rows =
thrust::count_if(rmm::exec_policy(stream),
input.size_begin(),
input.size_begin() + input.size(),
[] __device__(auto const x) { return x == 0; }) == input.size();
auto row_size_iter = thrust::make_constant_iterator(all_empty_rows ? 0 : percentiles.size());
thrust::exclusive_scan(rmm::exec_policy(stream),
row_size_iter,
row_size_iter + input.size() + 1,
offsets->mutable_view().begin<offset_type>());

if (percentiles.size() == 0) {
if (percentiles.size() == 0 || all_empty_rows) {
return cudf::make_lists_column(
input.size(),
std::move(offsets),
Expand Down
3 changes: 1 addition & 2 deletions cpp/tests/quantiles/percentile_approx_test.cu
Original file line number Diff line number Diff line change
Expand Up @@ -405,8 +405,7 @@ TEST_F(PercentileApproxTest, EmptyInput)
3,
cudf::test::detail::make_null_mask(nulls.begin(), nulls.end()));

// TODO: change percentile_approx to produce sanitary list outputs for this case.
CUDF_TEST_EXPECT_COLUMNS_EQUIVALENT(*result, *expected);
CUDF_TEST_EXPECT_COLUMNS_EQUAL(*result, *expected);
}

TEST_F(PercentileApproxTest, EmptyPercentiles)
Expand Down

0 comments on commit a67b718

Please sign in to comment.