fix > intmax num inputs for scan_by_key #1424

cwharris · 2021-05-06T19:15:07Z

Fixes NVIDIA/cccl#766. With these updates scan_by_key supports a higher number of inputs. The number of inputs is now capped by tile_idx, which is type int. The actual number of supported inputs is intmax * ITEMS_PER_TILE, where ITEMS_PER_TILE is determined via cub/thrust PtxPolicy.

alliepiper

LGTM aside from one minor change to avoid introducing a build warning.

alliepiper · 2021-05-10T18:41:04Z

thrust/system/cuda/detail/scan_by_key.h

@@ -512,7 +512,7 @@ namespace __scan_by_key {
            inequality_op(equality_op_),
            scan_op(scan_op_)
      {
-        int  tile_idx      = blockIdx.x;
+        Size tile_idx      = blockIdx.x;


Can you switch this back to using a static_cast<Size> in the tile_base calculation and leave tile_idx as an int?

tile_idx is passed to consume_tile, which expects it to be an int, and this change will introduce truncation warnings for 8-byte Size types.

Ah this is my fault. We should actually we use Size for tile_idx all the way down, I think. It's conceivable tile_idx can rise higher than intmax. In my experiments ITEMS_PER_TILE was 9 * 256, meaning 1 >> 43 number of inputs would overflow.

Do you think having tile_idx as Size would be problematic?

I just realized tile_idx goes all the way down to TilePrefixCallbackOp, which accepts int for tile_idx in the constructor, so making a change to tile_idx's type would require changes to all single-pass scan algorithms.

A block's x dimension must always fit in an int, so it's best to leave tile_idx as-is. If we needed more tiles, it'd need to be handled at a higher level of the implementation.

…n instead.

alliepiper · 2021-05-10T22:09:47Z

This LGTM, I'll run it through our tests. I should be able to land it before the next release.

Thanks for the patch!

alliepiper · 2021-05-10T22:13:29Z

DVS CL: 29947024

run tests

cwharris · 2021-05-11T15:13:08Z

I make the mistake of thinking Size was available when determining num_items, but that is the point at which Size type is determined, so I changed it back to size_t.

alliepiper

Ah, that's unfortunate. This patch just got a bit more complicated -- see my inline comment.

alliepiper · 2021-05-11T16:00:40Z

thrust/system/cuda/detail/scan_by_key.h

@@ -734,7 +734,7 @@ namespace __scan_by_key {
                             ScanOp                     scan_op,
                             AddInitToScan              add_init_to_scan)
  {
-    int          num_items    = static_cast<int>(thrust::distance(keys_first, keys_last));
+    size_t       num_items    = static_cast<size_t>(thrust::distance(keys_first, keys_last));


This will likely introduce performance regressions -- using size_t unconditionally here will instantiate the scan_by_key implementation with Size=size_t, increasing register pressure and generating less efficient code for inputs that can be indexed by int.

Take a look at the macros in thrust/system/cuda/detail/dispatch.h -- these will conditionally switch between using int or size_t depending on the actual runtime value.

) same fix seen here, but via patch: NVIDIA/thrust#1424 Also fixes rapidsai/cuspatial#393 Alternatively, we could wait and update our thrust version, rather than patching the existing one. Authors: - Christopher Harris (https://github.com/cwharris) Approvers: - Mark Harris (https://github.com/harrism) - Paul Taylor (https://github.com/trxcllnt) URL: #8199

cwharris · 2021-05-22T21:17:25Z

Closing because this isn't a viable solution without a major overhaul of the single-pass scan utilities, and/or adding conditional dispatched based on the size of input.

@allisonvacanti

…_key" (#8263) Reverts #8199 According to @allisonvacanti (NVIDIA/thrust#1424 (comment)) this patch will likely have adverse effect on performance. We should revert it until a better solution can be found. Authors: - Christopher Harris (https://github.com/cwharris) Approvers: - David Wendt (https://github.com/davidwendt) - Keith Kraus (https://github.com/kkraus14) - Elias Stehle (https://github.com/elstehle) URL: #8263

alliepiper · 2021-05-25T16:00:58Z

We should address this in the long term through NVIDIA/cub#212 and moving Thrust kernels into CUB.

fix > intmax num inputs for scan_by_key

f537e42

alliepiper suggested changes May 10, 2021

View reviewed changes

alliepiper assigned cwharris May 10, 2021

alliepiper added this to the 1.13.0 milestone May 10, 2021

cwharris force-pushed the intmax-num-items-fix branch from 09c2dcc to 62e1891 Compare May 10, 2021 19:26

revert scan_by_key tile_idx type to int, static_cast on multiplicatio…

33a33d3

…n instead.

cwharris force-pushed the intmax-num-items-fix branch from 62e1891 to 33a33d3 Compare May 10, 2021 19:27

cwharris requested a review from alliepiper May 10, 2021 19:27

cwharris mentioned this pull request May 10, 2021

patch thrust to fix intmax num elements limitation in scan_by_key rapidsai/cudf#8199

Merged

alliepiper added testing: internal ci in progress Currently testing on internal NVIDIA CI (DVS). testing: gpuCI in progress Started gpuCI testing. labels May 10, 2021

use size_t where Size was unavailable

949fc33

alliepiper suggested changes May 11, 2021

View reviewed changes

cwharris mentioned this pull request May 12, 2021

Revert "patch thrust to fix intmax num elements limitation in scan_by_key" rapidsai/cudf#8226

Closed

alliepiper removed testing: internal ci in progress Currently testing on internal NVIDIA CI (DVS). testing: gpuCI in progress Started gpuCI testing. labels May 12, 2021

cwharris mentioned this pull request May 18, 2021

Revert "patch thrust to fix intmax num elements limitation in scan_by_key" rapidsai/cudf#8263

Merged

cwharris closed this May 22, 2021

alliepiper removed this from the 1.13.0 milestone Jun 1, 2021

alliepiper unassigned cwharris Jun 1, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix > intmax num inputs for scan_by_key #1424

fix > intmax num inputs for scan_by_key #1424

cwharris commented May 6, 2021 •

edited

Loading

alliepiper left a comment

alliepiper May 10, 2021

cwharris May 10, 2021 •

edited

Loading

cwharris May 10, 2021 •

edited

Loading

alliepiper May 10, 2021

alliepiper commented May 10, 2021

alliepiper commented May 10, 2021

cwharris commented May 11, 2021

alliepiper left a comment

alliepiper May 11, 2021

cwharris commented May 22, 2021

alliepiper commented May 25, 2021

fix > intmax num inputs for scan_by_key #1424

fix > intmax num inputs for scan_by_key #1424

Conversation

cwharris commented May 6, 2021 • edited Loading

alliepiper left a comment

Choose a reason for hiding this comment

alliepiper May 10, 2021

Choose a reason for hiding this comment

cwharris May 10, 2021 • edited Loading

Choose a reason for hiding this comment

cwharris May 10, 2021 • edited Loading

Choose a reason for hiding this comment

alliepiper May 10, 2021

Choose a reason for hiding this comment

alliepiper commented May 10, 2021

alliepiper commented May 10, 2021

cwharris commented May 11, 2021

alliepiper left a comment

Choose a reason for hiding this comment

alliepiper May 11, 2021

Choose a reason for hiding this comment

cwharris commented May 22, 2021

alliepiper commented May 25, 2021

cwharris commented May 6, 2021 •

edited

Loading

cwharris May 10, 2021 •

edited

Loading

cwharris May 10, 2021 •

edited

Loading