-
Notifications
You must be signed in to change notification settings - Fork 449
Fix overflow in reduce #592
Fix overflow in reduce #592
Conversation
@@ -355,7 +355,7 @@ struct AgentReduce | |||
{ | |||
AccumT thread_aggregate{}; | |||
|
|||
if (even_share.block_offset + TILE_ITEMS > even_share.block_end) | |||
if (even_share.block_end - even_share.block_offset < TILE_ITEMS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify my assumptions: is it always true that even_share.block_end <= even_share.block_offset
? Can they be equal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not happy, that we transform the condition differently here and below. I like @canonizer suggestion below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@miscco, @canonizer suggestion doesn't change the fact that this line, or the line below has to be changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@canonizer, for segmented reduce block_offset
is always less than block_end
. For reduce the number of blocks is about RoundUp(num_items, tile_size)
, while block_offset
is just block_id * TILE_ITEMS
and block_end
is num_items
. The case of num_items == 0
is treated differently, so I don't think block_end
can be equal to block_offset
. Could you elaborate on why it's relevant here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering whether the whole algorithm would be simpler if we used int valid_items = even_share.block_end - even_share.block_offset;
as the main variable instead of repeatedly computing the remaining number of items.
That said, the change is definitely correct and a large scale refactor is a bit too much right now
@@ -355,7 +355,7 @@ struct AgentReduce | |||
{ | |||
AccumT thread_aggregate{}; | |||
|
|||
if (even_share.block_offset + TILE_ITEMS > even_share.block_end) | |||
if (even_share.block_end - even_share.block_offset < TILE_ITEMS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not happy, that we transform the condition differently here and below. I like @canonizer suggestion below
f313fa0
to
f920c37
Compare
f920c37
to
207b66b
Compare
@@ -355,7 +355,7 @@ struct AgentReduce | |||
{ | |||
AccumT thread_aggregate{}; | |||
|
|||
if (even_share.block_offset + TILE_ITEMS > even_share.block_end) | |||
if (even_share.block_end - even_share.block_offset < TILE_ITEMS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering whether the whole algorithm would be simpler if we used int valid_items = even_share.block_end - even_share.block_offset;
as the main variable instead of repeatedly computing the remaining number of items.
That said, the change is definitely correct and a large scale refactor is a bit too much right now
@miscco the algorithm would definitely be simpler. Nonetheless, the idea is that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved, provided that the comments are addressed.
6e96473
to
a56aed9
Compare
No description provided.