Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cudf::row_bit_count() support. #7534

Merged
merged 17 commits into from
Mar 30, 2021
Merged
Show file tree
Hide file tree
Changes from 15 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions cpp/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -389,6 +389,7 @@ add_library(cudf
src/transform/jit/code/kernel.cpp
src/transform/mask_to_bools.cu
src/transform/nans_to_nulls.cu
src/transform/row_bit_count.cu
nvdbaranec marked this conversation as resolved.
Show resolved Hide resolved
src/transform/transform.cpp
src/transpose/transpose.cu
src/unary/cast_ops.cu
Expand Down
12 changes: 11 additions & 1 deletion cpp/include/cudf/detail/transform.hpp
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2019-2020, NVIDIA CORPORATION.
* Copyright (c) 2019-2021, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -77,5 +77,15 @@ std::unique_ptr<column> mask_to_bools(
rmm::cuda_stream_view stream = rmm::cuda_stream_default,
rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource());

/**
* @copydoc cudf::row_bit_count
*
* @param stream CUDA stream used for device memory operations and kernel launches.
*/
std::unique_ptr<column> row_bit_count(
table_view const& t,
rmm::cuda_stream_view stream = rmm::cuda_stream_default,
rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource());

} // namespace detail
} // namespace cudf
1 change: 0 additions & 1 deletion cpp/include/cudf/lists/lists_column_view.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,6 @@ class lists_column_view : private column_view {
using column_view::null_mask;
using column_view::offset;
using column_view::size;
using offset_type = int32_t;
static_assert(std::is_same<offset_type, size_type>::value,
"offset_type is expected to be the same as size_type.");
using offset_iterator = offset_type const*;
Expand Down
31 changes: 30 additions & 1 deletion cpp/include/cudf/transform.hpp
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2019-2020, NVIDIA CORPORATION.
* Copyright (c) 2019-2021, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -142,5 +142,34 @@ std::unique_ptr<column> mask_to_bools(
size_type end_bit,
rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource());

/**
* @brief Returns an approximate cumulative size in bits of all columns in the `table_view` for
* each row.
*
nvdbaranec marked this conversation as resolved.
Show resolved Hide resolved
* This function counts bits instead of bytes to account for the null mask which only has one
* bit per row.
*
* Each row in the returned column is the sum of the per-row size for each column in
* the table.
*
* In some cases, this is an inexact approximation. Specifically, columns of lists and strings
* require N+1 offsets to represent N rows. It is up to the caller to calculate the small
* additional overhead of the terminating offset for any group of rows being considered.
*
* This function returns the per-row sizes as the columns are currently formed. This can
* end up being larger than the number you would get by gathering the rows. Specifically,
* the push-down of struct column validity masks can nullify rows that contain data for
* string or list columns. In these cases, the size returned is conservative:
*
* row_bit_count(column(x)) >= row_bit_count(gather(column(x)))
*
* @param t The table view to perform the computation on.
* @param mr Device memory resource used to allocate the returned columns's device memory
* @return A 32-bit integer column containing the per-row bit counts.
*/
std::unique_ptr<column> row_bit_count(
table_view const& t,
rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource());

/** @} */ // end of group
} // namespace cudf
1 change: 1 addition & 0 deletions cpp/include/cudf/types.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,7 @@ class mutable_table_view;
using size_type = int32_t;
using bitmask_type = uint32_t;
using valid_type = uint8_t;
using offset_type = int32_t;

/**
* @brief Similar to `std::distance` but returns `cudf::size_type` and performs `static_cast`
Expand Down
1 change: 1 addition & 0 deletions cpp/src/jit/type.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,7 @@ std::string get_type_name(data_type type)
// TODO: Remove in JIT type utils PR
switch (type.id()) {
case type_id::LIST: return CUDF_STRINGIFY(List);
case type_id::STRUCT: return CUDF_STRINGIFY(Struct);
case type_id::DECIMAL32: return CUDF_STRINGIFY(int32_t);
case type_id::DECIMAL64: return CUDF_STRINGIFY(int64_t);

Expand Down
2 changes: 1 addition & 1 deletion cpp/src/lists/drop_list_duplicates.cu
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ namespace cudf {
namespace lists {
namespace detail {
namespace {
using offset_type = lists_column_view::offset_type;

/**
* @brief Copy list entries and entry list offsets ignoring duplicates
*
Expand Down
Loading