-
Notifications
You must be signed in to change notification settings - Fork 757
thrust::sort fails for > 2.1B keys #1453
Comments
That is suspicious indeed, especially since that is right where we know of indexing bugs due to the code truncating the sizes. My suspicion is that |
@griwes A sanity check, as well as a run with |
This is related to NVIDIA/cub#212. I'm hoping to take a look at this in the next release or two. |
Thanks for your replies! We could apply a local hotfix along the lines of NVIDIA/cub#129 by changing the following files:
However, since we our multi-GPU sorting approach also utilizes |
Closing in favor of NVIDIA/cccl#744 |
Context
We are benchmarking the performance of
thrust::sort
with a pre-allocated temporary buffer. In a nutshell, we generate the data on the host, copy it onto the device, initialize a stream, pre-allocate a temporary buffer forthrust::sort
, and measure the sort duration.Example
We compile the example with
nvcc -O3 -std=c++17 -o thrust_sort thrust_sort.cu
and run it with./thrust_sort <num_elements>
on two different platforms.Observation
When varying the number of elements (through
num_elements
), the sort duration grows (almost) linearly with the number of elements up to <2.1B elements. Up until this point, all output elements are valid and in sorted order. Then, however, the sort duration drops sharply for ~2.1B elements. From there on, the output elements are all0
s. Nevertheless, the sort duration grows linearly again.We observe this behavior consistently on both systems, regardless of the
thrust
version (i.e., 1.11 or 1.12).Moreover,
cub::DeviceRadixSort::SortKeys
fails at precisely the same point. Unlikethrust::sort
, however, it fails instantly (i.e., in less than 0.0001s) and does not touch the input elements at all, making it immediately apparent that something went wrong.Conclusion
thrust::sort
andcub::DeviceRadixSort::SortKeys
fail for > 2.1B elements.The text was updated successfully, but these errors were encountered: