Skip to content
This repository has been archived by the owner on Mar 21, 2024. It is now read-only.

Fix overflow in reduce #592

Merged
merged 2 commits into from
Nov 25, 2022

Conversation

gevtushenko
Copy link
Collaborator

No description provided.

gevtushenko added a commit to gevtushenko/thrust that referenced this pull request Nov 15, 2022
@gevtushenko gevtushenko added the type: bug: functional Does not work as intended. label Nov 15, 2022
@gevtushenko gevtushenko added this to the 2.1.0 milestone Nov 15, 2022
@gevtushenko gevtushenko added the P0: must have Absolutely necessary. Critical issue, major blocker, etc. label Nov 15, 2022
@@ -355,7 +355,7 @@ struct AgentReduce
{
AccumT thread_aggregate{};

if (even_share.block_offset + TILE_ITEMS > even_share.block_end)
if (even_share.block_end - even_share.block_offset < TILE_ITEMS)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify my assumptions: is it always true that even_share.block_end <= even_share.block_offset? Can they be equal?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not happy, that we transform the condition differently here and below. I like @canonizer suggestion below

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@miscco, @canonizer suggestion doesn't change the fact that this line, or the line below has to be changed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@canonizer, for segmented reduce block_offset is always less than block_end. For reduce the number of blocks is about RoundUp(num_items, tile_size), while block_offset is just block_id * TILE_ITEMS and block_end is num_items. The case of num_items == 0 is treated differently, so I don't think block_end can be equal to block_offset. Could you elaborate on why it's relevant here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering whether the whole algorithm would be simpler if we used int valid_items = even_share.block_end - even_share.block_offset; as the main variable instead of repeatedly computing the remaining number of items.

That said, the change is definitely correct and a large scale refactor is a bit too much right now

cub/agent/agent_reduce.cuh Outdated Show resolved Hide resolved
cub/agent/agent_reduce.cuh Outdated Show resolved Hide resolved
cub/agent/agent_reduce.cuh Outdated Show resolved Hide resolved
@@ -355,7 +355,7 @@ struct AgentReduce
{
AccumT thread_aggregate{};

if (even_share.block_offset + TILE_ITEMS > even_share.block_end)
if (even_share.block_end - even_share.block_offset < TILE_ITEMS)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not happy, that we transform the condition differently here and below. I like @canonizer suggestion below

test/test_device_reduce.cu Show resolved Hide resolved
@gevtushenko gevtushenko force-pushed the fix-main/github/reduce_overflow branch from f313fa0 to f920c37 Compare November 19, 2022 23:23
@gevtushenko gevtushenko force-pushed the fix-main/github/reduce_overflow branch from f920c37 to 207b66b Compare November 20, 2022 23:27
gevtushenko added a commit to gevtushenko/thrust that referenced this pull request Nov 20, 2022
@@ -355,7 +355,7 @@ struct AgentReduce
{
AccumT thread_aggregate{};

if (even_share.block_offset + TILE_ITEMS > even_share.block_end)
if (even_share.block_end - even_share.block_offset < TILE_ITEMS)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering whether the whole algorithm would be simpler if we used int valid_items = even_share.block_end - even_share.block_offset; as the main variable instead of repeatedly computing the remaining number of items.

That said, the change is definitely correct and a large scale refactor is a bit too much right now

@gevtushenko
Copy link
Collaborator Author

@miscco the algorithm would definitely be simpler. Nonetheless, the idea is that valid_items overload is slower, so we stretch to this extent to avoid it for the bulk case.

Copy link
Contributor

@canonizer canonizer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved, provided that the comments are addressed.

cub/agent/agent_reduce.cuh Outdated Show resolved Hide resolved
gevtushenko added a commit to gevtushenko/thrust that referenced this pull request Nov 23, 2022
gevtushenko added a commit to gevtushenko/thrust that referenced this pull request Nov 24, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
P0: must have Absolutely necessary. Critical issue, major blocker, etc. testing: gpuCI passed Passed gpuCI testing. type: bug: functional Does not work as intended.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

4 participants