Skip to content

Commit

Permalink
revise retired SSE/AVX flops events def for AMD Zen4 (#216)
Browse files Browse the repository at this point in the history
Summary:
Pull Request resolved: #216

sse/avx flops event config in linux perf tool is different from the one defined in hbt
perf tool fp_ret_sse_avx_ops.all uses umask 0x1f, while hbt uses umask 0x0f

according to AMD manual:

 {F1325336882}

bit 4 is used to determine if bfloat mac should be counted as 2 ops. this should be true to provide consistent behavior

so this diff make Zen3 and Zen4 machines use different event to monitor SSE/AVX FLOPs

Reviewed By: bigzachattack

Differential Revision: D52861397

fbshipit-source-id: 4c2acbee9742a15db36b8bf0a26f1af946b745d5
  • Loading branch information
Alston Tang authored and facebook-github-bot committed Jan 23, 2024
1 parent ed7fec8 commit bfdae99
Showing 1 changed file with 3 additions and 1 deletion.
4 changes: 3 additions & 1 deletion hbt/src/perf_event/AmdEvents.h
Original file line number Diff line number Diff line change
Expand Up @@ -143,8 +143,10 @@ constexpr PmuMsr kL1AndL2PrefetcherMissesInL3{
.amdCore = {.event = 0x72, .unitMask = 0xff}};
// Flops
constexpr PmuMsr kRetiredX87Flops{.amdCore = {.event = 0x2, .unitMask = 0x7}};
constexpr PmuMsr kRetiredSseAvxFlops{
constexpr PmuMsr kZen3RetiredSseAvxFlops{
.amdCore = {.event = 0x3, .unitMask = 0xf}};
constexpr PmuMsr kZen4RetiredSseAvxFlops{
.amdCore = {.event = 0x3, .unitMask = 0x1f}};

// Branches
constexpr PmuMsr kRetiredBranchInstructions{.amdCore = {.event = 0xc2}};
Expand Down

0 comments on commit bfdae99

Please sign in to comment.