Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cudf::row_bit_count() support. #7534

Merged
merged 17 commits into from
Mar 30, 2021

Conversation

nvdbaranec
Copy link
Contributor

Closes #7408

Some notes:

  • There are some limitations on what this computes, specifically regarding lists or strings embedded inside structs that have null masks. I've added some documentation for this. @jlowe @revans2 This could be made to handle that case properly but it would incur a fairly significant performance cost, and likely would require a large amount of temporary memory.

  • I made some modifications to the test::print() code for lists and structs to be a little more clear when displaying null masks.

  • The structure of flatten_functor and flatten_hierarchy will probably raise some eyebrows. These functions return 3 separate pieces of data and rather than trying to cram them awkwardly through as actual return values, they are passed by reference.

@nvdbaranec nvdbaranec added feature request New feature or request 3 - Ready for Review Ready for review by team libcudf Affects libcudf (C++/CUDA) code. Spark Functionality that helps Spark RAPIDS non-breaking Non-breaking change labels Mar 8, 2021
@nvdbaranec nvdbaranec requested review from a team as code owners March 8, 2021 23:19
@nvdbaranec nvdbaranec requested review from trxcllnt and jrhemstad March 8, 2021 23:19
@github-actions github-actions bot added the CMake CMake build issue label Mar 8, 2021
Copy link
Contributor

@davidwendt davidwendt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. I really appreciate the detailed, well formatted comments.

cpp/tests/transform/row_bit_count_test.cu Outdated Show resolved Hide resolved
cpp/tests/transform/row_bit_count_test.cu Outdated Show resolved Hide resolved
cpp/tests/transform/row_bit_count_test.cu Show resolved Hide resolved
cpp/src/transform/row_bit_count.cu Outdated Show resolved Hide resolved
cpp/src/transform/row_bit_count.cu Show resolved Hide resolved
cpp/src/transform/row_bit_count.cu Outdated Show resolved Hide resolved
cpp/src/transform/row_bit_count.cu Show resolved Hide resolved
cpp/src/transform/row_bit_count.cu Outdated Show resolved Hide resolved
shmem_per_thread != 0
? std::min(max_block_size, shmem_limit_per_block / static_cast<int>(shmem_per_thread))
: max_block_size;
auto const shared_mem_size = shmem_per_thread * block_size;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a cool utility. Maybe consider adding it to cudf/detail/utilities/cuda.cuh.

cpp/tests/transform/row_bit_count_test.cu Outdated Show resolved Hide resolved
cpp/tests/transform/row_bit_count_test.cu Outdated Show resolved Hide resolved
cpp/src/transform/row_bit_count.cu Outdated Show resolved Hide resolved
@nvdbaranec nvdbaranec requested a review from davidwendt March 19, 2021 15:37
@nvdbaranec nvdbaranec requested a review from davidwendt March 23, 2021 14:56
@nvdbaranec nvdbaranec requested a review from davidwendt March 23, 2021 19:18
Copy link
Member

@harrism harrism left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some top-level doc suggestions

cpp/include/cudf/transform.hpp Outdated Show resolved Hide resolved
cpp/include/cudf/transform.hpp Outdated Show resolved Hide resolved
cpp/include/cudf/transform.hpp Outdated Show resolved Hide resolved
@harrism harrism removed the 3 - Ready for Review Ready for review by team label Mar 30, 2021
@harrism
Copy link
Member

harrism commented Mar 30, 2021

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 56976fa into rapidsai:branch-0.19 Mar 30, 2021
rapids-bot bot pushed a commit that referenced this pull request Mar 30, 2021
Adds Java bindings for `cudf::row_bit_count`.  This depends on #7534.

Authors:
  - Jason Lowe (@jlowe)

Approvers:
  - Robert (Bobby) Evans (@revans2)

URL: #7749
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CMake CMake build issue feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change Spark Functionality that helps Spark RAPIDS
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] an operator that will give you a fairly accurate size per row for a table.
6 participants