-
Notifications
You must be signed in to change notification settings - Fork 920
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] size_type overflow in cudf::groupby::detail::hash::extract_populated_keys
#12058
Comments
@jrhemstad FYI |
Can you provide a C++ example? I'm not following the syntax of the example in the description too well. |
Elements are int32 and they are monotonously increasing from 0 to 1.2B, such that when we do the groupby there is no reduction. I can try a C++ repro, but let me know if the above info is enough. |
I am working on a C++ repro with @nvdbaranec's help as well. |
Added a C++ repro @davidwendt in the description. I also added some logging in rmm
|
It looks like the error is actually in the cudf/cpp/src/groupby/hash/groupby.cu Line 519 in a3e9c1c
In the example from the description, the I can work around this but it looks like this will still run out of memory later in the gather step (on my 48GB GPU). |
I'm not very familiar with the code in that file so I'm unclear why the I assume that map capacity is something that is around 1.5 to 2.0 map size (which is the number of actual elements). Update: Found that such |
Adding up the numbers in #12058 (comment) we are at 38.4GB. So just pointing out that this would've run out-of-memory on a 48GB GPU anyway even without the overflow error. |
Regardless, I can create a PR to workaround the overflow error in the |
Thanks for taking a look @davidwendt, @ttnghia. It doesn't make a lot of sense to run out of memory with 48GB for ~5GB input for a simple sum. Looking with @jlowe and @nvdbaranec the second overflow in The main thing is not having the overflow, if there is another overflow via the |
There is no overflow in gather. This is doing a gather on 1.2B elements of INT64 => 9.6GB. We just don't have that much memory left. |
@abellina I found that there is a check for overflow in
|
This check passes so there is no overflow in gather. |
And I can run this with a 80GB GPU once there's a patch somewhere, to see if that succeeds. |
Workaround for limitation in `thrust::copy_if` which fails if the input-iterator spans more than int-max. The `thrust::copy_if` hardcodes the iterator distance type to be an int https://github.com/NVIDIA/thrust/blob/dbd144ed543b60c4ff9d456edd19869e82fe8873/thrust/system/cuda/detail/copy_if.h#L699-L708 Found existing thrust issue: https://github.com/NVIDIA/thrust/issues/1302 This calls the `copy_if` in chunks if the iterator can span greater than int-max. Closes #12058 Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Alessandro Bellina (https://github.com/abellina) - Robert Maynard (https://github.com/robertmaynard) - Nghia Truong (https://github.com/ttnghia) URL: #12079
We look to be overflowing
size_type
while inextract_populated_keys
. The iterator onrmm::device_uvector
is signed so I think we need to go to unsigned in order to be able to group by up to the 2B row limit.I believe the issue is actually when calling
populated_keys.resize
where thestd::distance
may be negative, but the stack trace I got (on a non-debug build) points atcopy_if
. Either way it definitely looks to be an issue triggered in this function.C++ repro:
Stack trace:
The text was updated successfully, but these errors were encountered: