Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cudf::row_bit_count() support. #7534

Merged
merged 17 commits into from
Mar 30, 2021
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion cpp/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -377,7 +377,8 @@ add_library(cudf
src/transform/mask_to_bools.cu
src/transform/nans_to_nulls.cu
src/transform/transform.cpp
src/transpose/transpose.cu
src/transform/row_bit_count.cu
nvdbaranec marked this conversation as resolved.
Show resolved Hide resolved
src/transpose/transpose.cu
src/unary/cast_ops.cu
src/unary/math_ops.cu
src/unary/nan_ops.cu
Expand Down
12 changes: 11 additions & 1 deletion cpp/include/cudf/detail/transform.hpp
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2019-2020, NVIDIA CORPORATION.
* Copyright (c) 2019-2021, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -77,5 +77,15 @@ std::unique_ptr<column> mask_to_bools(
rmm::cuda_stream_view stream = rmm::cuda_stream_default,
rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource());

/**
* @copydoc cudf::row_bit_count
*
* @param stream CUDA stream used for device memory operations and kernel launches.
*/
std::unique_ptr<column> row_bit_count(
table_view const& t,
rmm::cuda_stream_view stream = rmm::cuda_stream_default,
rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource());

} // namespace detail
} // namespace cudf
34 changes: 33 additions & 1 deletion cpp/include/cudf/transform.hpp
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2019-2020, NVIDIA CORPORATION.
* Copyright (c) 2019-2021, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -142,5 +142,37 @@ std::unique_ptr<column> mask_to_bools(
size_type end_bit,
rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource());

/**
* @brief Returns the cumulative size in bits of all columns in the `table_view` for
nvdbaranec marked this conversation as resolved.
Show resolved Hide resolved
* each row.
*
nvdbaranec marked this conversation as resolved.
Show resolved Hide resolved
* Each row in the returned column is the sum of the per-row size for each column in
* the table.
*
* In some cases, this is an inexact approximation. Specifically, with
* lists or strings, the cost of a row includes 32 bits for a single offset. However, two
* offsets is required to represent an entire row. But this presents a problem, because to
nvdbaranec marked this conversation as resolved.
Show resolved Hide resolved
* represent 2 rows, you need 3 offsets. 3 rows 4 offsets, etc. Therefore it would not
* be accurate to say each row of a string column costs 2 offsets because summing multiple row
* sizes would give you a number too large. It is up to the caller to understand the schema
* of the input column to be able to calculate the small additional overhead of the
* terminating offset for any group of rows being considered.
nvdbaranec marked this conversation as resolved.
Show resolved Hide resolved
*
* This function returns the per-row sizes as the columns are currently formed. This can
* end up being different than the number you would get by gathering the rows under
* certain circumstances. Specifically, the pushdown of validity masks by struct
* columns can nullify rows that actually contain underlying data for string or list
* columns. In these cases, the sized returned will be strictly:
nvdbaranec marked this conversation as resolved.
Show resolved Hide resolved
*
* row_bit_count(column(x)) >= row_bit_count(gather(column(x)))
*
* @param t The table view to perform the computation on.
* @param mr Device memory resource used to allocate the returned columns's device memory
* @return A 32-bit integer column containing the per-row byte counts.
nvdbaranec marked this conversation as resolved.
Show resolved Hide resolved
*/
std::unique_ptr<column> row_bit_count(
table_view const& t,
rmm::mr::device_memory_resource* mr = rmm::mr::get_current_device_resource());

/** @} */ // end of group
} // namespace cudf
1 change: 1 addition & 0 deletions cpp/src/jit/type.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,7 @@ std::string get_type_name(data_type type)
// TODO: Remove in JIT type utils PR
switch (type.id()) {
case type_id::LIST: return CUDF_STRINGIFY(List);
case type_id::STRUCT: return CUDF_STRINGIFY(Struct);
case type_id::DECIMAL32: return CUDF_STRINGIFY(int32_t);
case type_id::DECIMAL64: return CUDF_STRINGIFY(int64_t);

Expand Down
Loading