[Issue]: Zero TCC_HIT_sum all the time #150

RookieT0T · 2024-12-20T05:12:07Z

Problem Description

While using the rocprofv2 to collect performance counters like TCC_HIT_sum and TCC_MISS_sum on Vega 20, I found the value of TCC_HIT_sum is always 0 and TCC_MISS_sum shows some non-zero values, which I assume it works. If you can investigate why hit information is always 0 (including all hit information from 16 cache banks) and double check if the value of tcc miss is correct, that will be much appreciated. BTW, this problem exists regardless I collect the performance counters in ROCm version of 6.2.2-116 or in the docker image of 6.3.0.

Example output.csv returned from the profiler:
Index,KernelName,gpu-id,queue-id,queue-index,pid,tid,grd,wgr,lds,scr,arch_vgpr,accum_vgpr,sgpr,wave_size,sig,obj,FlatVMemInsts,TCC_EA_RDREQ_sum,TCC_EA_RDREQ_32B_sum,TCC_HIT_sum,TCC_MISS_sum,TCC_MISS[12],TCC_MISS[13],TCC_MISS[14],TCC_MISS[15],TCC_HIT[0],TCC_HIT[1],TCC_HIT[2],TCC_HIT[3],TCC_HIT[4],TCC_HIT[5],TCC_HIT[6],TCC_HIT[7],TCC_HIT[8],TCC_HIT[9],TCC_HIT[10],TCC_HIT[11],TCC_HIT[12],TCC_HIT[13],TCC_HIT[14],TCC_HIT[15],TA_FLAT_WRITE_WAVEFRONTS_sum,TA_FLAT_READ_WAVEFRONTS_sum,TCC_EA_RDREQ[0],TCC_EA_RDREQ[1],TCC_EA_RDREQ[2],TCC_EA_RDREQ[3],TCC_EA_RDREQ[4],TCC_EA_RDREQ[5],TCC_EA_RDREQ[6],TCC_EA_RDREQ[7],TCC_EA_RDREQ[8],TCC_EA_RDREQ[9],TCC_EA_RDREQ[10],TCC_EA_RDREQ[11],TCC_EA_RDREQ[12],TCC_EA_RDREQ[13],TCC_EA_RDREQ[14],TCC_EA_RDREQ[15],TCC_MISS[0],TCC_MISS[1],TCC_MISS[2],TCC_MISS[3],TCC_MISS[4],TCC_MISS[5],TCC_MISS[6],TCC_MISS[7],TCC_MISS[8],TCC_MISS[9],TCC_MISS[10],TCC_MISS[11]
0,"kernel(int*) [clone .kd]",1,0,1,14761,14761,1,1,0,0,40,0,48,64,0x0,0x79eecbe84540,60.0000000000,68.0000000000,0.0000000000,0.0000000000,102.0000000000,4,0,4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.0000000000,60.0000000000,0,0,0,0,0,0,6,60,1,0,0,0,0,0,0,0,4,0,4,6,0,10,4,61,5,0,7,4

Operating System

Ubuntu 24.04.1 LTS

CPU

AMD Ryzen 9 3900X 12-Core Processor

GPU

gfx906 (AMD Vega 7nm also referred to as AMD Vega 20)

ROCm Version

ROCm 6.3.0

ROCm Component

No response

Steps to Reproduce

No response

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

No response

Additional Information

No response

ppanchad-amd · 2024-12-20T19:23:22Z

Hi @RookieT0T. Internal ticket has been created to assist with your issue. Thanks!

zichguan-amd · 2024-12-31T19:35:11Z

Hi @RookieT0T, can you share the workload that you are trying to profile? It's normal to have 0 L2 hit rate if your workload doesn't reuse any cached data.

RookieT0T · 2024-12-31T20:01:33Z

Hi @RookieT0T, can you share the workload that you are trying to profile? It's normal to have 0 L2 hit rate if your workload doesn't reuse any cached data.

My workload has a bunch of flat_load_dwordx2 instructions contained in the asm volatile brackets (in the kernel function). The addresses specified in those load instructions should incur some cache hits. Also, "glc" flag is specified at the end of each load instruction to enforce the cache accesses bypass the L1 cache like TCP and then directly go to the L2 cache like TCC.

Example of kernel function with only one load instruction:

global void kernel(int * arr) {
uint64_t a = 0;

asm volatile(
"s_waitcnt vmcnt(0) & lgkmcnt(0)\n\t"
"buffer_wbinvl1\n\t"
"flat_load_dwordx2 %[out0], %[in1] glc\n\t"
"s_waitcnt vmcnt(0) & lgkmcnt(0)\n\t"
"s_nop 0\n\t"

      : [out0]"=v"(a)
      : [in1]"v"((uint64_t *)&arr[0])
      : "memory");

}

RookieT0T · 2025-01-02T02:22:01Z

Hi @RookieT0T, can you share the workload that you are trying to profile? It's normal to have 0 L2 hit rate if your workload doesn't reuse any cached data.

I am wondering if the flag "glc" is added, will the instruction cache hits also be part of the TCC hits sum reported by the profiler in addition to the data cache hits incurred by the program?

ppanchad-amd added the Under Investigation label Dec 20, 2024

RookieT0T closed this as completed Dec 31, 2024

RookieT0T reopened this Dec 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Issue]: Zero TCC_HIT_sum all the time #150

[Issue]: Zero TCC_HIT_sum all the time #150

RookieT0T commented Dec 20, 2024 •

edited

Loading

ppanchad-amd commented Dec 20, 2024

zichguan-amd commented Dec 31, 2024

RookieT0T commented Dec 31, 2024 •

edited

Loading

RookieT0T commented Jan 2, 2025

[Issue]: Zero TCC_HIT_sum all the time #150

[Issue]: Zero TCC_HIT_sum all the time #150

Comments

RookieT0T commented Dec 20, 2024 • edited Loading

Problem Description

Operating System

CPU

GPU

ROCm Version

ROCm Component

Steps to Reproduce

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

Additional Information

ppanchad-amd commented Dec 20, 2024

zichguan-amd commented Dec 31, 2024

RookieT0T commented Dec 31, 2024 • edited Loading

RookieT0T commented Jan 2, 2025

RookieT0T commented Dec 20, 2024 •

edited

Loading

RookieT0T commented Dec 31, 2024 •

edited

Loading