Skip to content
This repository has been archived by the owner on Mar 21, 2024. It is now read-only.

fix > intmax num inputs for scan_by_key #1424

Closed
wants to merge 3 commits into from

Conversation

cwharris
Copy link

@cwharris cwharris commented May 6, 2021

Fixes NVIDIA/cccl#766. With these updates scan_by_key supports a higher number of inputs. The number of inputs is now capped by tile_idx, which is type int. The actual number of supported inputs is intmax * ITEMS_PER_TILE, where ITEMS_PER_TILE is determined via cub/thrust PtxPolicy.

Copy link
Collaborator

@alliepiper alliepiper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM aside from one minor change to avoid introducing a build warning.

@@ -512,7 +512,7 @@ namespace __scan_by_key {
inequality_op(equality_op_),
scan_op(scan_op_)
{
int tile_idx = blockIdx.x;
Size tile_idx = blockIdx.x;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you switch this back to using a static_cast<Size> in the tile_base calculation and leave tile_idx as an int?

tile_idx is passed to consume_tile, which expects it to be an int, and this change will introduce truncation warnings for 8-byte Size types.

Copy link
Author

@cwharris cwharris May 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah this is my fault. We should actually we use Size for tile_idx all the way down, I think. It's conceivable tile_idx can rise higher than intmax. In my experiments ITEMS_PER_TILE was 9 * 256, meaning 1 >> 43 number of inputs would overflow.

Do you think having tile_idx as Size would be problematic?

Copy link
Author

@cwharris cwharris May 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized tile_idx goes all the way down to TilePrefixCallbackOp, which accepts int for tile_idx in the constructor, so making a change to tile_idx's type would require changes to all single-pass scan algorithms.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A block's x dimension must always fit in an int, so it's best to leave tile_idx as-is. If we needed more tiles, it'd need to be handled at a higher level of the implementation.

@alliepiper alliepiper added this to the 1.13.0 milestone May 10, 2021
@cwharris cwharris force-pushed the intmax-num-items-fix branch from 09c2dcc to 62e1891 Compare May 10, 2021 19:26
@alliepiper
Copy link
Collaborator

This LGTM, I'll run it through our tests. I should be able to land it before the next release.

Thanks for the patch!

@alliepiper
Copy link
Collaborator

DVS CL: 29947024

run tests

@alliepiper alliepiper added testing: internal ci in progress Currently testing on internal NVIDIA CI (DVS). testing: gpuCI in progress Started gpuCI testing. labels May 10, 2021
@cwharris
Copy link
Author

I make the mistake of thinking Size was available when determining num_items, but that is the point at which Size type is determined, so I changed it back to size_t.

Copy link
Collaborator

@alliepiper alliepiper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that's unfortunate. This patch just got a bit more complicated -- see my inline comment.

@@ -734,7 +734,7 @@ namespace __scan_by_key {
ScanOp scan_op,
AddInitToScan add_init_to_scan)
{
int num_items = static_cast<int>(thrust::distance(keys_first, keys_last));
size_t num_items = static_cast<size_t>(thrust::distance(keys_first, keys_last));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will likely introduce performance regressions -- using size_t unconditionally here will instantiate the scan_by_key implementation with Size=size_t, increasing register pressure and generating less efficient code for inputs that can be indexed by int.

Take a look at the macros in thrust/system/cuda/detail/dispatch.h -- these will conditionally switch between using int or size_t depending on the actual runtime value.

rapids-bot bot pushed a commit to rapidsai/cudf that referenced this pull request May 11, 2021
)

same fix seen here, but via patch: NVIDIA/thrust#1424

Also fixes rapidsai/cuspatial#393

Alternatively, we could wait and update our thrust version, rather than patching the existing one.

Authors:
  - Christopher Harris (https://github.com/cwharris)

Approvers:
  - Mark Harris (https://github.com/harrism)
  - Paul Taylor (https://github.com/trxcllnt)

URL: #8199
@alliepiper alliepiper removed testing: internal ci in progress Currently testing on internal NVIDIA CI (DVS). testing: gpuCI in progress Started gpuCI testing. labels May 12, 2021
@cwharris
Copy link
Author

Closing because this isn't a viable solution without a major overhaul of the single-pass scan utilities, and/or adding conditional dispatched based on the size of input.

@cwharris cwharris closed this May 22, 2021
rapids-bot bot pushed a commit to rapidsai/cudf that referenced this pull request May 24, 2021
…_key" (#8263)

Reverts #8199

According to @allisonvacanti (NVIDIA/thrust#1424 (comment)) this patch will likely have adverse effect on performance. We should revert it until a better solution can be found.

Authors:
  - Christopher Harris (https://github.com/cwharris)

Approvers:
  - David Wendt (https://github.com/davidwendt)
  - Keith Kraus (https://github.com/kkraus14)
  - Elias Stehle (https://github.com/elstehle)

URL: #8263
@alliepiper
Copy link
Collaborator

We should address this in the long term through NVIDIA/cub#212 and moving Thrust kernels into CUB.

@alliepiper alliepiper removed this from the 1.13.0 milestone Jun 1, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] inclusive_scan_by_key OOM on >= INT_MAX elements
2 participants