-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] SM80_CP_ASYNC_ support L2 cache prefetch hints #1174
Comments
@ccecka what do you think? |
@reed-lau , could you make a pr to just change the ptx to use 128B prefetch all the time. it is the same behavior as 2.x then. |
@hwu36 In some cases enabling L2 prefetching may kill performance, what do you think about this issue. How about leaving the option to the end user? |
What cases? |
I remember when I was optimizing sparse convolutions for a lidar network, enabling L2 perfetch could hurt performance. But I'm not so sure now. |
When cp.async is used, 128B prefetch is always enabled. #1177 |
Is your feature request related to a problem? Please describe.
For Ampere architecture(SM80+),
cp.async
instruction support.level::prefetch_size
as a fetch hint.I found it worked for my case.
I wish SM80_CP_ASYNC_s structure and their Traits in cute could support this feature.
Describe the solution you'd like
My solution is adding an integer template parameter named
L2PrefetchSize
to specify the prefetch_size.For the implementation, we use the
if constexpr
to dispatch to different assembly code at compile time(value from 0/64/128/256).The template parameter L2PrefetchSize is set to 0 to indicate no prefetch is made by default.
If you approve this solution, I could help PR it.
The text was updated successfully, but these errors were encountered: