Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use grid_1d utilities in copy_range.cuh #17409

Merged
merged 1 commit into from
Dec 4, 2024

Conversation

davidwendt
Copy link
Contributor

Description

Use the grid_1d utilities to manage thread and stride calculations in the copy_range.cuh kernels.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@davidwendt davidwendt added 2 - In Progress Currently a work in progress libcudf Affects libcudf (C++/CUDA) code. improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Nov 21, 2024
@davidwendt davidwendt self-assigned this Nov 21, 2024
@davidwendt davidwendt added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currently a work in progress labels Nov 22, 2024
@davidwendt davidwendt marked this pull request as ready for review November 22, 2024 21:21
@davidwendt davidwendt requested a review from a team as a code owner November 22, 2024 21:21
@@ -67,15 +67,15 @@ CUDF_KERNEL void offset_bitmask_binop(Binop op,
size_type source_size_bits,
size_type* count_ptr)
{
auto const tid = threadIdx.x + blockIdx.x * blockDim.x;
auto const tid = cudf::detail::grid_1d::global_thread_id();

auto const last_bit_index = source_size_bits - 1;
auto const last_word_index = cudf::word_index(last_bit_index);

size_type thread_count = 0;

for (size_type destination_word_index = tid; destination_word_index < destination.size();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does tid need to be a thread_index_type? Or do we assume that it's sufficient to let this be size_type because it's a nullmask and thus we only have to worry about a max of size_type bits, leading to (2^31 / 32 = 2^26) as the max possible word index?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I have thought about this before because I tried to refactor this kernel to use safe thread types, and gave up due to this possibility being a distraction.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am okay with leaving this as-is and not worrying about that possibility, as long as we agree the status quo is sufficiently safe.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree. I was partly future-proofing against size_type but mostly trying to keep the overflow-checking robots at bay.

@davidwendt
Copy link
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit 1b01df3 into rapidsai:branch-25.02 Dec 4, 2024
104 checks passed
@davidwendt davidwendt deleted the blockdim-copyrange branch December 4, 2024 13:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants