[FEA] SM80_CP_ASYNC_ support L2 cache prefetch hints #1174

reed-lau · 2023-11-03T07:31:45Z

Is your feature request related to a problem? Please describe.
For Ampere architecture(SM80+), cp.async instruction support .level::prefetch_size as a fetch hint.
I found it worked for my case.
I wish SM80_CP_ASYNC_s structure and their Traits in cute could support this feature.

Describe the solution you'd like

My solution is adding an integer template parameter named L2PrefetchSize to specify the prefetch_size.
For the implementation, we use the if constexpr to dispatch to different assembly code at compile time(value from 0/64/128/256).
The template parameter L2PrefetchSize is set to 0 to indicate no prefetch is made by default.

template <class TS, class TD = TS, int L2PrefetchSize = 0>
struct SM80_CP_ASYNC_CACHEALWAYS {
  if constexpr (L2PrefetchSize == 0) {
    asm volatile('cp.async... ');
  } else if constexpr (L2PrefetchSize == 64) {
    asm volatile('cp.async...L2::64 ...');
  } ...
  } else {
    static_assert(0, "unsupport prefetch size for cp.async");
  }
  
}

If you approve this solution, I could help PR it.

The text was updated successfully, but these errors were encountered:

thakkarV · 2023-11-03T15:41:26Z

@ccecka what do you think?

hwu36 · 2023-11-03T17:02:35Z

@reed-lau , could you make a pr to just change the ptx to use 128B prefetch all the time. it is the same behavior as 2.x then.

reed-lau · 2023-11-06T03:03:35Z

@hwu36 In some cases enabling L2 prefetching may kill performance, what do you think about this issue. How about leaving the option to the end user?

hwu36 · 2023-11-06T03:05:20Z

What cases?

reed-lau · 2023-11-06T03:50:11Z

What cases?

I remember when I was optimizing sparse convolutions for a lidar network, enabling L2 perfetch could hurt performance. But I'm not so sure now.
I will do a PR first(#1177), and when I encounter this case in the future, I will do a test and comment it here.

reed-lau · 2023-11-14T05:17:10Z

When cp.async is used, 128B prefetch is always enabled. #1177

reed-lau added ? - Needs Triage feature request New feature or request labels Nov 3, 2023

thakkarV removed the ? - Needs Triage label Nov 3, 2023

reed-lau mentioned this issue Nov 6, 2023

Enable L2::128B prefetch for cp.async by default #1177

Merged

reed-lau closed this as completed Nov 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] SM80_CP_ASYNC_ support L2 cache prefetch hints #1174

[FEA] SM80_CP_ASYNC_ support L2 cache prefetch hints #1174

reed-lau commented Nov 3, 2023 •

edited

Loading

thakkarV commented Nov 3, 2023

hwu36 commented Nov 3, 2023

reed-lau commented Nov 6, 2023

hwu36 commented Nov 6, 2023

reed-lau commented Nov 6, 2023

reed-lau commented Nov 14, 2023

[FEA] SM80_CP_ASYNC_ support L2 cache prefetch hints #1174

[FEA] SM80_CP_ASYNC_ support L2 cache prefetch hints #1174

Comments

reed-lau commented Nov 3, 2023 • edited Loading

thakkarV commented Nov 3, 2023

hwu36 commented Nov 3, 2023

reed-lau commented Nov 6, 2023

hwu36 commented Nov 6, 2023

reed-lau commented Nov 6, 2023

reed-lau commented Nov 14, 2023

reed-lau commented Nov 3, 2023 •

edited

Loading