-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Issue]: Zero TCC_HIT_sum all the time #150
Comments
Hi @RookieT0T. Internal ticket has been created to assist with your issue. Thanks! |
Hi @RookieT0T, can you share the workload that you are trying to profile? It's normal to have 0 L2 hit rate if your workload doesn't reuse any cached data. |
My workload has a bunch of flat_load_dwordx2 instructions contained in the asm volatile brackets (in the kernel function). The addresses specified in those load instructions should incur some cache hits. Also, "glc" flag is specified at the end of each load instruction to enforce the cache accesses bypass the L1 cache like TCP and then directly go to the L2 cache like TCC. Example of kernel function with only one load instruction: global void kernel(int * arr) { asm volatile(
} |
I am wondering if the flag "glc" is added, will the instruction cache hits also be part of the TCC hits sum reported by the profiler in addition to the data cache hits incurred by the program? |
Problem Description
While using the rocprofv2 to collect performance counters like TCC_HIT_sum and TCC_MISS_sum on Vega 20, I found the value of TCC_HIT_sum is always 0 and TCC_MISS_sum shows some non-zero values, which I assume it works. If you can investigate why hit information is always 0 (including all hit information from 16 cache banks) and double check if the value of tcc miss is correct, that will be much appreciated. BTW, this problem exists regardless I collect the performance counters in ROCm version of 6.2.2-116 or in the docker image of 6.3.0.
Example output.csv returned from the profiler:
Index,KernelName,gpu-id,queue-id,queue-index,pid,tid,grd,wgr,lds,scr,arch_vgpr,accum_vgpr,sgpr,wave_size,sig,obj,FlatVMemInsts,TCC_EA_RDREQ_sum,TCC_EA_RDREQ_32B_sum,TCC_HIT_sum,TCC_MISS_sum,TCC_MISS[12],TCC_MISS[13],TCC_MISS[14],TCC_MISS[15],TCC_HIT[0],TCC_HIT[1],TCC_HIT[2],TCC_HIT[3],TCC_HIT[4],TCC_HIT[5],TCC_HIT[6],TCC_HIT[7],TCC_HIT[8],TCC_HIT[9],TCC_HIT[10],TCC_HIT[11],TCC_HIT[12],TCC_HIT[13],TCC_HIT[14],TCC_HIT[15],TA_FLAT_WRITE_WAVEFRONTS_sum,TA_FLAT_READ_WAVEFRONTS_sum,TCC_EA_RDREQ[0],TCC_EA_RDREQ[1],TCC_EA_RDREQ[2],TCC_EA_RDREQ[3],TCC_EA_RDREQ[4],TCC_EA_RDREQ[5],TCC_EA_RDREQ[6],TCC_EA_RDREQ[7],TCC_EA_RDREQ[8],TCC_EA_RDREQ[9],TCC_EA_RDREQ[10],TCC_EA_RDREQ[11],TCC_EA_RDREQ[12],TCC_EA_RDREQ[13],TCC_EA_RDREQ[14],TCC_EA_RDREQ[15],TCC_MISS[0],TCC_MISS[1],TCC_MISS[2],TCC_MISS[3],TCC_MISS[4],TCC_MISS[5],TCC_MISS[6],TCC_MISS[7],TCC_MISS[8],TCC_MISS[9],TCC_MISS[10],TCC_MISS[11]
0,"kernel(int*) [clone .kd]",1,0,1,14761,14761,1,1,0,0,40,0,48,64,0x0,0x79eecbe84540,60.0000000000,68.0000000000,0.0000000000,0.0000000000,102.0000000000,4,0,4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.0000000000,60.0000000000,0,0,0,0,0,0,6,60,1,0,0,0,0,0,0,0,4,0,4,6,0,10,4,61,5,0,7,4
Operating System
Ubuntu 24.04.1 LTS
CPU
AMD Ryzen 9 3900X 12-Core Processor
GPU
gfx906 (AMD Vega 7nm also referred to as AMD Vega 20)
ROCm Version
ROCm 6.3.0
ROCm Component
No response
Steps to Reproduce
No response
(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support
No response
Additional Information
No response
The text was updated successfully, but these errors were encountered: