Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] The libcudf device atomic operations cause memcheck errors on types smaller than 4 bytes #13575

Closed
davidwendt opened this issue Jun 14, 2023 · 3 comments
Assignees
Labels
1 - On Deck To be worked on next bug Something isn't working libcudf Affects libcudf (C++/CUDA) code.

Comments

@davidwendt
Copy link
Contributor

The device_atomics implementation in libcudf are failing memchecks when used on types smaller than 4 bytes.

auto* address_uint32 = reinterpret_cast<T_int*>(addr - (reinterpret_cast<size_t>(addr) & 3));
T_int shift = ((reinterpret_cast<size_t>(addr) & 3) * 8);
T_int old = *address_uint32;
T_int assumed;

This was found while investigating some memcheck errors that occurred on cudf::reduce where the aggregation return type is bool (1 byte) and an atomicOr was used to produce the output. Reference: #13574

The solution is to use the libcudaxx cuda::atomic and cuda::atomic_ref in place of these implementations and potentially remove the device_atomics.cuh altogether.

@davidwendt davidwendt added bug Something isn't working Needs Triage Need team to review and classify libcudf Affects libcudf (C++/CUDA) code. labels Jun 14, 2023
@PointKernel
Copy link
Member

Depends on NVIDIA/cccl#1024

rapids-bot bot pushed a commit that referenced this issue Jul 10, 2023
Contributes to #13575

Depends on #13574, #13578

This PR cleans up custom atomic implementations in libcudf by using `cuda::atomic_ref` when possible. It removes atomic bitwise operations like `and`, `or` and `xor` since libcudac++ already provides proper replacements.

Authors:
  - Yunsong Wang (https://github.com/PointKernel)

Approvers:
  - Bradley Dice (https://github.com/bdice)
  - David Wendt (https://github.com/davidwendt)

URL: #13583
@GregoryKimball GregoryKimball added 1 - On Deck To be worked on next and removed Needs Triage Need team to review and classify labels Jul 22, 2023
@PointKernel
Copy link
Member

Update: atomic_ref is limited to hardware-supported types, so it doesn't work properly with types like booleans, timestamps, and durations. As a result, we can't fully eliminate device_atomics.cuh. This issue should be addressed alternatively.

@davidwendt
Copy link
Contributor Author

Closed by #13583 and NVIDIA/cccl#1024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1 - On Deck To be worked on next bug Something isn't working libcudf Affects libcudf (C++/CUDA) code.
Projects
None yet
Development

No branches or pull requests

3 participants