[QST] Question on customize epilogue reduction #1301

zejia-lin · 2024-01-11T08:14:31Z

What is your question?

Hello, I found that many epilogues are element-wise. I wondered if it could be customized to sum up a 2*2 tile instead of an element-wise operation. That is, for D = AB + C, where A is a (m*2, k) matrix, B is a (k, n*2) matrix, and C, D is (m, n) matrix . While AB produces a (m*2, n*2) matrix, is it possible to sum up every 2*2 tile of the output matrix and produce a (m, n) matrix?

Many thanks for any advice.

The text was updated successfully, but these errors were encountered:

hwu36 · 2024-01-11T14:25:16Z

which hardware do you use? what is the data type? do you want to use tensor cores?

zejia-lin · 2024-01-12T05:40:11Z

Thanks!

I am using A100, both cutlass 2.x and 3.x is suitable for me. The data type of A and B are int8 with int32 accumulation, C and D are int32. I do want to use tensor core.

My custom kernel basically similar to pooling, which takes 2*2 elements and returns 1 element, but it has more complex operation internally. I found in this issue #188 said cutlass has no pooling at March 2, 2021. I was wondering if there is such functionality now.

Specifically, if there is any interface, I could easily implement it:

operate on the accumulation fragment after performing GEMM and before writing to global memory. Possibly the epilogue stage, I guess.
change the load and store pattern of C and D. Because A, B are 2m-by-2n-by-k matrices, and produce a 2m-by-2n matrix. My kernel works on the 2m-by-2n matrix and produces a m-by-n matrix, which is the dimension of C and D.

I do concern about memory consumption, so I don't want to store the 2m-by-2n matrix in global memory and launch another kernel to perform this operation.

hwu36 · 2024-01-13T04:17:07Z

in 2.x, you can get row coordinate from row_offset + thread_start_row_ in https://github.com/NVIDIA/cutlass/blob/main/include/cutlass/epilogue/threadblock/predicated_tile_iterator.h#L398

every threads own several fragments. every fragment owns kElementsPerAccess consecutive data in the same row. you can first do 1x2 reduction here. then do more reduction with different threads in the next row.

you can first dump row coordinates and check the mapping between thread id and row coordinate. all the mapping information you need is actually in ThreadMap (https://github.com/NVIDIA/cutlass/blob/main/include/cutlass/epilogue/threadblock/predicated_tile_iterator.h#L71)

don't forget to change the memory pointer at last (

cutlass/include/cutlass/epilogue/threadblock/predicated_tile_iterator.h

Line 414 in acba5be

memory_pointer = reinterpret_cast<AccessType *>(byte_pointer + byte_offset

) for the new coordinates.

zejia-lin · 2024-01-13T07:34:58Z

Thank you for the detailed reply. I'll try it later.

github-actions · 2024-02-16T17:04:41Z

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

mnicely · 2024-02-22T15:09:56Z

@zejia-lin have you resolved your issue?

zejia-lin · 2024-02-23T06:07:09Z

I am sorry for the late response. I found I was not able to resolve it under reasonable efforts. I am closing this issue.

zejia-lin added ? - Needs Triage question Question labels Jan 11, 2024

mnicely removed the ? - Needs Triage label Jan 17, 2024

github-actions bot added the inactive-30d label Feb 16, 2024

github-actions bot removed the inactive-30d label Feb 22, 2024

zejia-lin closed this as completed Feb 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QST] Question on customize epilogue reduction #1301

[QST] Question on customize epilogue reduction #1301

zejia-lin commented Jan 11, 2024

hwu36 commented Jan 11, 2024

zejia-lin commented Jan 12, 2024 •

edited

Loading

hwu36 commented Jan 13, 2024

zejia-lin commented Jan 13, 2024

github-actions bot commented Feb 16, 2024

mnicely commented Feb 22, 2024

zejia-lin commented Feb 23, 2024

[QST] Question on customize epilogue reduction #1301

[QST] Question on customize epilogue reduction #1301

Comments

zejia-lin commented Jan 11, 2024

hwu36 commented Jan 11, 2024

zejia-lin commented Jan 12, 2024 • edited Loading

hwu36 commented Jan 13, 2024

zejia-lin commented Jan 13, 2024

github-actions bot commented Feb 16, 2024

mnicely commented Feb 22, 2024

zejia-lin commented Feb 23, 2024

zejia-lin commented Jan 12, 2024 •

edited

Loading