Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PCM Does Not Display LMB and RMB Metrics in Prometheus Integration #761

Closed
bellalzohir opened this issue Jun 18, 2024 · 14 comments
Closed

Comments

@bellalzohir
Copy link

Hello Intel PCM and PQoS developers,

I am facing an issue with Intel PCM where it fails to report local and remote memory bandwidth (LMB and RMB) metrics when monitored through Prometheus, despite these metrics being available and correctly reported when using pqos-msr.

Environment:
OS: Linux kernel 5.15.0-112-generic

Configuration:
RDT features enabled (rdt=cmt, mbmtotal, mbmlocal, l3cat, l3cdp, l2cat, l2cdp, mba) through GRUB configuration: GRUB_CMDLINE_LINUX="hugepagesz=1G hugepages=12 rdt=cmt, mbmtotal, mbmlocal, l3cat, l3cdp, l2cat, l2cdp, mba"
CONFIG_X86_CPU_RESCTRL=y in kernel configuration

Issue:
When monitoring server performance metrics using Intel PCM with Prometheus, the local and remote memory bandwidth metrics (LMB and RMB) consistently report as 0, indicating no data is being captured or transmitted. However, using pqos-msr, these metrics are clearly available and accurately reported.

Execute sudo pqos-msr -d and note the additional memory bandwidth metrics being monitored.

sudo pqos-msr -d
NOTE:  Mixed use of MSR and kernel interfaces to manage
       CAT or CMT & MBM may lead to unexpected behavior.
WARN: resctl filesystem mounted! Using MSR interface may corrupt resctrl filesystem and cause unexpected behaviour
Hardware capabilities
    Monitoring
        Cache Monitoring Technology (CMT) events:
            LLC Occupancy (LLC)
                 I/O RDT: unsupported
        Memory Bandwidth Monitoring (MBM) events:
            Total Memory Bandwidth (TMEM)
                 I/O RDT: unsupported
            Local Memory Bandwidth (LMEM)
                 I/O RDT: unsupported
            Remote Memory Bandwidth (RMEM) (calculated)
                 I/O RDT: unsupported
        PMU events:
            Instructions/Clock (IPC)
            LLC misses
            LLC references
            LLC misses - pcie read
            LLC misses - pcie write
            LLC references - pcie read
            LLC references - pcie write
    Allocation
        Cache Allocation Technology (CAT)
            L3 CAT
                CDP: disabled
                Non-Contiguous CBM: unsupported
                I/O RDT: unsupported
                Num COS: 16
        Memory Bandwidth Allocation (MBA)
            Num COS: 8

Expected Behavior:
Intel PCM should accurately capture and export all available memory bandwidth metrics, including LMB and RMB, to Prometheus.

Actual Behavior:
LMB and RMB metrics appear as 0 in Prometheus, suggesting an issue with either the PCM data capture or the export process.

I appreciate any assistance or guidance you can provide and am available for further testing or to provide additional information as needed.

@rdementi
Copy link
Contributor

Hi, thanks for creating the issue. Could you please also share the output of lscpu command?

@bellalzohir
Copy link
Author

Hi, thank you for getting back to me so quickly.
Here is the lscpu output:

  Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         46 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  28
  On-line CPU(s) list:   0-27
Vendor ID:               GenuineIntel
  Model name:            Intel(R) Xeon(R) Gold 6132 CPU @ 2.60GHz
    CPU family:          6
    Model:               85
    Thread(s) per core:  1
    Core(s) per socket:  14
    Socket(s):           2
    Stepping:            4
    CPU max MHz:         2600.0000
    CPU min MHz:         1000.0000
    BogoMIPS:            5200.00
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge m
                         ca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 s
                         s ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc
                         art arch_perfmon pebs bts rep_good nopl xtopology nons
                         top_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor
                         ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm p
                         cid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline
                         _timer aes xsave avx f16c rdrand lahf_lm abm 3dnowpref
                         etch cpuid_fault epb cat_l3 cdp_l3 invpcid_single pti
                         intel_ppin ssbd mba ibrs ibpb stibp tpr_shadow vnmi fl
                         expriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hl
                         e avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512
                         f avx512dq rdseed adx smap clflushopt clwb intel_pt av
                         x512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsave
                         s cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dt
                         herm arat pln pts pku ospke md_clear flush_l1d arch_ca
                         pabilities
Virtualization features:
  Virtualization:        VT-x
Caches (sum of all):
  L1d:                   896 KiB (28 instances)
  L1i:                   896 KiB (28 instances)
  L2:                    28 MiB (28 instances)
  L3:                    38.5 MiB (2 instances)
NUMA:
  NUMA node(s):          2
  NUMA node0 CPU(s):     0,2,4,6,8,10,12,14,16,18,20,22,24,26
  NUMA node1 CPU(s):     1,3,5,7,9,11,13,15,17,19,21,23,25,27
Vulnerabilities:
  Gather data sampling:  Mitigation; Microcode
  Itlb multihit:         KVM: Mitigation: VMX disabled
  L1tf:                  Mitigation; PTE Inversion; VMX conditional cache flush
                         es, SMT disabled
  Mds:                   Mitigation; Clear CPU buffers; SMT disabled
  Meltdown:              Mitigation; PTI
  Mmio stale data:       Mitigation; Clear CPU buffers; SMT disabled
  Retbleed:              Mitigation; IBRS
  Spec rstack overflow:  Not affected
  Spec store bypass:     Mitigation; Speculative Store Bypass disabled via prct
                         l and seccomp
  Spectre v1:            Mitigation; usercopy/swapgs barriers and __user pointe
                         r sanitization
  Spectre v2:            Mitigation; IBRS; IBPB conditional; RSB filling; PBRSB
                         -eIBRS Not affected; BHI Not affected
  Srbds:                 Not affected
  Tsx async abort:       Mitigation; Clear CPU buffers; SMT disabled

pqos monitoring VS pqos-msr monitoring

## monitoring with pqos
TIME 2024-06-18 16:55:31
    CORE         IPC      MISSES     LLC[KB]
       0        1.12        165k       336.0
       1        0.95        218k      1904.0
       2        0.86        164k       448.0
       3        0.86        256k       448.0
       4        0.80        106k       504.0
       5        1.38        263k      2128.0
       6        0.89        197k       280.0
       7        1.51        209k       560.0
       8        1.18        218k      2240.0
       9        0.87        142k       840.0
      10        0.74        272k       336.0
      11        1.37        165k       504.0
      12        0.65        230k       392.0
      13        1.57        464k      1792.0
      14        0.78        212k       560.0
      15        0.79        253k       560.0
      16        0.73        188k       336.0
      17        0.84        263k       728.0
      18        1.67        619k     10864.0
      19        1.04         88k       616.0
      20        1.09        266k       448.0
      21        1.40        226k       840.0
      22        0.89        164k       448.0
      23        1.50        299k       560.0
      24        1.45        321k       112.0
      25        1.44        232k       224.0
      26        0.93        313k      1568.0
      27        1.48        372k      6328.0

##monitoring with pqos-msr
IME 2024-06-18 16:56:06
    CORE         IPC      MISSES     LLC[KB]   MBL[MB/s]   MBR[MB/s]
       0        0.70         30k      1008.0         0.7         1.3
       1        1.77        126k       896.0         3.3         2.7
       2        1.34         19k      1848.0         0.7         0.8
       3        0.73         71k       504.0         1.3         1.0
       4        0.74         16k       952.0         0.9         0.4
       5        0.71         77k       392.0         1.8         2.2
       6        0.73         64k      2072.0         2.0         1.0
       7        1.84        497k         0.0         0.0         0.0
       8        0.83         47k      1568.0         1.3         1.0
       9        1.88        220k      1288.0         6.8        10.5
      10        0.87         43k      1008.0         1.2         0.7
      11        0.76         68k       728.0         1.4         1.7
      12        0.85         49k       560.0         0.7         0.5
      13        0.73         71k       224.0         2.0         1.9
      14        0.86         75k      1736.0         3.2         2.6
      15        1.67        127k      3920.0         5.1         8.6
      16        1.09         79k      2184.0         2.2         3.1
      17        0.92        117k         0.0         0.0         0.0
      18        0.87         43k      1064.0         0.7         2.1
      19        1.83        132k      2632.0         3.8         5.1
      20        0.94         45k         0.0         0.0         0.0
      21        0.64         28k       224.0         0.9         0.3
      22        0.86         68k      1792.0         2.1         1.6
      23        0.90         83k      1400.0         2.7         1.8
      24        0.83         26k      1512.0         0.4         0.2
      25        1.81         33k      1736.0         0.7         1.4
      26        0.77         56k      1344.0         0.7         1.0
      27        1.65        211k      2464.0         7.5        10.6

WARN: Core 7 RMID association changed from 4 to 0! The core has been hijacked!
WARN: Core 17 RMID association changed from 9 to 0! The core has been hijacked!
WARN: Core 20 RMID association changed from 11 to 0! The core has been hijacked!

Questions:

  • Could there be specific configurations or enhancements within PCM that might enable it to access and display LMB and RMB metrics as pqos-msr does?
  • Are there known limitations or conditions under which PCM might not access certain MSR registers effectively?
  • Any guidance or recommended settings that could help ensure PCM captures all relevant MSR data, particularly for memory bandwidth metrics?

@rdementi
Copy link
Contributor

Thanks, that is helpful. On your CPU we are disabling reading these RDT counters from HW due to errata. Linux kernel does the same: torvalds/linux@d56593e
But when booting the Linux kernel with your RDT options above RDT is re-enabled in the kernel. We can add a similar option in PCM to re-enable these metrics.

@bellalzohir
Copy link
Author

Thank you for the quick response and for clarifying the situation with the RDT counters on my CPU.

I appreciate the suggestion to add an option in PCM to re-enable these metrics. To provide further context, I have already enabled RDT features from the boot configuration to ensure that all requisite counters are available at the OS level. My current GRUB configuration is set as follows:

GRUB_CMDLINE_LINUX="hugepagesz=1G hugepages=12 rdt=cmt, mbmtotal, mbmlocal, l3cat, l3cdp, l2cat, l2cdp, mba"

This configuration was intended to ensure that the RDT options are fully enabled within the Linux kernel. Given this setup, I would be interested in understanding if there might be additional steps or configurations within PCM that need to be aligned with this kernel setting to effectively monitor these metrics.

Looking forward to your guidance on how we might proceed to achieve comprehensive monitoring capabilities.

@rdementi
Copy link
Contributor

Thank you for the quick response and for clarifying the situation with the RDT counters on my CPU.

I appreciate the suggestion to add an option in PCM to re-enable these metrics. To provide further context, I have already enabled RDT features from the boot configuration to ensure that all requisite counters are available at the OS level. My current GRUB configuration is set as follows:

GRUB_CMDLINE_LINUX="hugepagesz=1G hugepages=12 rdt=cmt, mbmtotal, mbmlocal, l3cat, l3cdp, l2cat, l2cdp, mba"

This configuration was intended to ensure that the RDT options are fully enabled within the Linux kernel. Given this setup, I would be interested in understanding if there might be additional steps or configurations within PCM that need to be aligned with this kernel setting to effectively monitor these metrics.

Looking forward to your guidance on how we might proceed to achieve comprehensive monitoring capabilities.

the change has been implemented. Set this environment variable: export PCM_ENFORCE_MBM=1
https://github.com/intel/pcm/blob/master/doc/ENVVAR_README.md

@bellalzohir
Copy link
Author

bellalzohir commented Jun 26, 2024

hank you for your previous response and the guidance provided.

I have successfully mounted the resctrl filesystem and set the environment variable PCM_ENFORCE_MBM=1 to enforce memory bandwidth monitoring. However, upon starting the PCM sensor server, I encountered errors related to accessing memory bandwidth metrics files in the resctrl filesystem.


~# export PCM_ENFORCE_MBM=1
~# cd zouhirRepos/PCMUpadate/pcm/build/bin/
:~/zouhirRepos/PCMUpadate/pcm/build/bin# ./pcm-sensor-server

=====  Processor information  =====
Linux arch_perfmon flag  : yes
Hybrid processor         : no
IBRS and IBPB supported  : yes
STIBP supported          : yes
Spec arch caps supported : yes
Max CPUID level          : 22
CPU model number         : 85
Number of physical cores: 28
Number of logical cores: 28
Number of online logical cores: 28
Threads (logical cores) per physical core: 1
Num sockets: 2
Physical cores per socket: 14
Last level cache slices per socket: 14
Core PMU (perfmon) version: 4
Number of core PMU generic (programmable) counters: 8
Width of generic (programmable) counters: 48 bits
Number of core PMU fixed counters: 3
Width of fixed counters: 48 bits
Nominal core frequency: 2600000000 Hz
IBRS enabled in the kernel   : yes
STIBP enabled in the kernel  : no
The processor is not susceptible to Rogue Data Cache Load: no
The processor supports enhanced IBRS                     : no
Package thermal spec power: 140 Watt; Package minimum power: 66 Watt; Package maximum power: 297 Watt;

INFO: Linux perf interface to program uncore PMUs is present
Socket 0: 2 memory controllers detected with total number of 6 channels. 3 UPI ports detected. 2 M2M (mesh to memory)/B2CMI blocks detected. 0 HBM M2M blocks detected. 0 EDC/HBM channels detected. 0 Home Agents detected. 3 M3UPI/B2UPI blocks detected.
Socket 1: 2 memory controllers detected with total number of 6 channels. 3 UPI ports detected. 2 M2M (mesh to memory)/B2CMI blocks detected. 0 HBM M2M blocks detected. 0 EDC/HBM channels detected. 0 Home Agents detected. 3 M3UPI/B2UPI blocks detected.
Socket 0: 1 PCU units detected. 6 IIO units detected. 6 IRP units detected. 14 CHA/CBO units detected. 0 MDF units detected. 1 UBOX units detected. 0 CXL units detected. 0 PCIE_GEN5x16 units detected. 0 PCIE_GEN5x8 units detected.
Socket 1: 1 PCU units detected. 6 IIO units detected. 6 IRP units detected. 14 CHA/CBO units detected. 0 MDF units detected. 1 UBOX units detected. 0 CXL units detected. 0 PCIE_GEN5x16 units detected. 0 PCIE_GEN5x8 units detected.
INFO: using Linux resctrl driver for RDT metrics (L3OCC, LMB, RMB) because resctrl driver is mounted.

 Closed perf event handles
Trying to use Linux perf events...
Successfully programmed on-core PMU using Linux perf
Socket 0
Max UPI link 0 speed: 23.3 GBytes/second (10.4 GT/second)
Max UPI link 1 speed: 23.3 GBytes/second (10.4 GT/second)
Max UPI link 2 speed: 21.5 GBytes/second (9.6 GT/second)
Socket 1
Max UPI link 0 speed: 23.3 GBytes/second (10.4 GT/second)
Max UPI link 1 speed: 23.3 GBytes/second (10.4 GT/second)
Max UPI link 2 speed: 21.5 GBytes/second (9.6 GT/second)
Starting plain HTTP server on http://localhost:9738/

Error Messages:

Error reading /sys/fs/resctrl/mon_groups/pcm10/mon_data/mon_L3_01/mbm_local_bytes. Error: No such file or directory
ERROR: Can not open /sys/fs/resctrl/mon_groups/pcm10/mon_data/mon_L3_00/mbm_total_bytes file.

Despite the filesystem already being mounted, I faced challenges when trying to adjust the mount settings to enable different features.
Mount Attempt1:

# mount -t resctrl resctrl  /sys/fs/resctrl
mount: /sys/fs/resctrl: resctrl already mounted on /sys/fs/resctrl.

Mount Attempt2:
I tried to remount the resctrl filesystem with specific options but received an error regarding bad usage:

mount -t resctrl resctrl -o cdp,cdpl2,mba_MBps /sys/fs/resctrl

Your insights or any further guidance on how to resolve these file access and mounting issues would be greatly appreciated.

@rdementi
Copy link
Contributor

there seems to be an issue with the Linux RDT driver (config). Could you try unmounting resctrl and set this env variable in PCM:

export PCM_USE_RESCTRL=0

Then PCM will access RDT directly.

@bellalzohir
Copy link
Author

Hello,

I'm experiencing an issue where Local and Remote Memory Bandwidth (LMB and RMB) metrics are not displayed in Prometheus, despite proper configuration and troubleshooting steps taken.

Steps and observations:

  1. Based on a suggestion, I disabled the Linux RDT driver via RESCTRL with export PCM_USE_RESCTRL=0 and unmounted resctrl to allow PCM direct access to RDT. However, this did not resolve the issue.
  2. I attempted to create a custom monitoring solution that collects only the LMB and RMB using the output from pqos-msr and integrate this data with PCM data in Prometheus. Unfortunately, I encountered issues running PCM server and pqos monitoring simultaneously. The error message indicates that monitoring on core 0 is already started.

I would appreciate any insights or potential solutions

Thank you for your assistance.

@rdementi
Copy link
Contributor

rdementi commented Aug 8, 2024

Hello,

I'm experiencing an issue where Local and Remote Memory Bandwidth (LMB and RMB) metrics are not displayed in Prometheus, despite proper configuration and troubleshooting steps taken.

Steps and observations:

  1. Based on a suggestion, I disabled the Linux RDT driver via RESCTRL with export PCM_USE_RESCTRL=0 and unmounted resctrl to allow PCM direct access to RDT. However, this did not resolve the issue.

sorry for the delay (I was out of office). Could you please share the output of ./pcm-sensor-server in this scenario?

  1. I attempted to create a custom monitoring solution that collects only the LMB and RMB using the output from pqos-msr and integrate this data with PCM data in Prometheus. Unfortunately, I encountered issues running PCM server and pqos monitoring simultaneously. The error message indicates that monitoring on core 0 is already started.

this is expected. You can't run pcm and pqos to monitor the RDT metrics because they both try to program/use them exclusively.

@rdementi
Copy link
Contributor

rdementi commented Aug 8, 2024

could you please also share the complete output of "pcm -r -i=1" main utility (run exclusively to pcm-sensor-server or pqos)?

@rdementi
Copy link
Contributor

rdementi commented Aug 8, 2024

and also the output of "curl --silent http://localhost:9738/metrics | grep Memory_Bandwidth" when pcm-sensor-server is run exclusively?

@bellalzohir
Copy link
Author

Thank you for your response here is the output of the following command lines

# ./pcm-sensor-server

root@seroics:~/zouhirRepos/pcm/pcm/build/bin# export PCM_USE_RESCTRL=0
root@seroics:~/zouhirRepos/pcm/pcm/build/bin# ./pcm-sensor-server

=====  Processor information  =====
Linux arch_perfmon flag  : yes
Hybrid processor         : no
IBRS and IBPB supported  : yes
STIBP supported          : yes
Spec arch caps supported : yes
Max CPUID level          : 22
CPU model number         : 85
Number of physical cores: 28
Number of logical cores: 28
Number of online logical cores: 28
Threads (logical cores) per physical core: 1
Num sockets: 2
Physical cores per socket: 14
Last level cache slices per socket: 14
Core PMU (perfmon) version: 4
Number of core PMU generic (programmable) counters: 3
Width of generic (programmable) counters: 48 bits
Number of core PMU fixed counters: 3
Width of fixed counters: 48 bits
Nominal core frequency: 2600000000 Hz
IBRS enabled in the kernel   : yes
STIBP enabled in the kernel  : no
The processor is not susceptible to Rogue Data Cache Load: no
The processor supports enhanced IBRS                     : no
Package thermal spec power: 140 Watt; Package minimum power: 66 Watt; Package maximum power: 297 Watt;

INFO: Linux perf interface to program uncore PMUs is present
Socket 0: 2 memory controllers detected with total number of 6 channels. 3 UPI ports detected. 2 M2M (mesh to memory)/B2CMI blocks detected. 0 HBM M2M blocks detected. 0 EDC/HBM channels detected. 0 Home Agents detected. 3 M3UPI/B2UPI blocks detected.
Socket 1: 2 memory controllers detected with total number of 6 channels. 3 UPI ports detected. 2 M2M (mesh to memory)/B2CMI blocks detected. 0 HBM M2M blocks detected. 0 EDC/HBM channels detected. 0 Home Agents detected. 3 M3UPI/B2UPI blocks detected.
Socket 0: 1 PCU units detected. 6 IIO units detected. 6 IRP units detected. 14 CHA/CBO units detected. 0 MDF units detected. 1 UBOX units detected. 0 CXL units detected. 0 PCIE_GEN5x16 units detected. 0 PCIE_GEN5x8 units detected.
Socket 1: 1 PCU units detected. 6 IIO units detected. 6 IRP units detected. 14 CHA/CBO units detected. 0 MDF units detected. 1 UBOX units detected. 0 CXL units detected. 0 PCIE_GEN5x16 units detected. 0 PCIE_GEN5x8 units detected.
Initializing RMIDs

 Closed perf event handles
Trying to use Linux perf events...
Successfully programmed on-core PMU using Linux perf
Socket 0
Max UPI link 0 speed: 23.3 GBytes/second (10.4 GT/second)
Max UPI link 1 speed: 23.3 GBytes/second (10.4 GT/second)
Max UPI link 2 speed: 21.5 GBytes/second (9.6 GT/second)
Socket 1
Max UPI link 0 speed: 23.3 GBytes/second (10.4 GT/second)
Max UPI link 1 speed: 23.3 GBytes/second (10.4 GT/second)
Max UPI link 2 speed: 21.5 GBytes/second (9.6 GT/second)
Starting plain HTTP server on http://localhost:9738/

#pcm -r -i=1

Processor Counter Monitor  (202201-1)


Linux arch_perfmon flag  : yes
Hybrid processor         : no
IBRS and IBPB supported  : yes
STIBP supported          : yes
Spec arch caps supported : yes
Number of physical cores: 28
Number of logical cores: 28
Number of online logical cores: 28
Threads (logical cores) per physical core: 1
Num sockets: 2
Physical cores per socket: 14
Last level cache slices per socket: 14
Core PMU (perfmon) version: 4
Number of core PMU generic (programmable) counters: 8
Width of generic (programmable) counters: 48 bits
Number of core PMU fixed counters: 3
Width of fixed counters: 48 bits
Nominal core frequency: 2600000000 Hz
IBRS enabled in the kernel   : yes
STIBP enabled in the kernel  : no
The processor is not susceptible to Rogue Data Cache Load: no
The processor supports enhanced IBRS                     : no
Package thermal spec power: 140 Watt; Package minimum power: 66 Watt; Package maximum power: 297 Watt;
INFO: Linux perf interface to program uncore PMUs is present
Socket 0: 2 memory controllers detected with total number of 6 channels. 3 QPI ports detected. 2 M2M (mesh to memory) blocks detected. 0 Home Agents detected. 3 M3UPI blocks detected.
Socket 1: 2 memory controllers detected with total number of 6 channels. 3 QPI ports detected. 2 M2M (mesh to memory) blocks detected. 0 Home Agents detected. 3 M3UPI blocks detected.
Initializing RMIDs

 Resetting PMU configuration
 Zeroed PMU registers
Trying to use Linux perf events...
Successfully programmed on-core PMU using Linux perf
Socket 0
Max QPI link 0 speed: 23.3 GBytes/second (10.4 GT/second)
Max QPI link 1 speed: 23.3 GBytes/second (10.4 GT/second)
Max QPI link 2 speed: 21.5 GBytes/second (9.6 GT/second)
Socket 1
Max QPI link 0 speed: 23.3 GBytes/second (10.4 GT/second)
Max QPI link 1 speed: 23.3 GBytes/second (10.4 GT/second)
Max QPI link 2 speed: 21.5 GBytes/second (9.6 GT/second)

Detected Intel(R) Xeon(R) Gold 6132 CPU @ 2.60GHz "Intel(r) microarchitecture codename Skylake-SP" stepping 4 microcode level 0x2007006

 EXEC  : instructions per nominal CPU cycle
 IPC   : instructions per CPU cycle
 FREQ  : relation to nominal CPU frequency='unhalted clock ticks'/'invariant timer ticks' (includes Intel Turbo Boost)
 AFREQ : relation to nominal CPU frequency while in active state (not in power-saving C state)='unhalted clock ticks'/'invariant timer ticks while in C0-state'  (includes Intel Turbo Boost)
 L3MISS: L3 (read) cache misses
 L2MISS: L2 (read) cache misses (including other core's L2 cache *hits*)
 L3HIT : L3 (read) cache hit ratio (0.00-1.00)
 L2HIT : L2 cache hit ratio (0.00-1.00)
 L3MPI : number of L3 (read) cache misses per instruction
 L2MPI : number of L2 (read) cache misses per instruction
 READ  : bytes read from main memory controller (in GBytes)
 WRITE : bytes written to main memory controller (in GBytes)
 LOCAL : ratio of local memory requests to memory controller in %
 L3OCC : L3 occupancy (in KBytes)
 TEMP  : Temperature reading in 1 degree Celsius relative to the TjMax temperature (thermal headroom): 0 corresponds to the max temperature
 energy: Energy in Joules


 Core (SKT) | EXEC | IPC  | FREQ  | AFREQ | L3MISS | L2MISS | L3HIT | L2HIT | L3MPI | L2MPI |   L3OCC | TEMP

   0    0     0.00   0.53   0.00    0.67    1707     5948      0.59    0.76    0.00    0.00      336     51
   1    1     0.00   0.45   0.01    0.75    8879       31 K    0.59    0.69    0.00    0.00        0     46
   2    0     0.00   0.54   0.00    0.71    4615       17 K    0.62    0.75    0.00    0.00      448     52
   3    1     0.00   0.62   0.00    0.65    1539     7636      0.69    0.81    0.00    0.00      448     46
   4    0     0.00   0.54   0.01    0.67    4712       19 K    0.64    0.77    0.00    0.00      112     52
   5    1     0.00   0.49   0.00    0.75    1951     6513      0.58    0.72    0.00    0.00      448     46
   6    0     0.00   0.66   0.00    0.63    1109     4292      0.60    0.88    0.00    0.00      168     52
   7    1     0.00   0.48   0.00    0.66    6900       14 K    0.48    0.66    0.00    0.00      728     48
   8    0     0.01   0.75   0.01    0.74      22 K     36 K    0.27    0.73    0.00    0.00     1176     52
   9    1     0.00   0.52   0.01    0.79      13 K     37 K    0.44    0.73    0.00    0.00      728     45
  10    0     0.01   0.78   0.01    0.97      31 K     34 K    0.07    0.71    0.00    0.00        0     52
  11    1     0.01   1.79   0.01    0.87    3355       10 K    0.58    0.91    0.00    0.00      448     47
  12    0     0.00   0.52   0.00    0.67    1755     5169      0.57    0.77    0.00    0.00     1456     54
  13    1     0.00   0.63   0.00    0.60    3572       11 K    0.52    0.87    0.00    0.00      112     47
  14    0     0.00   0.53   0.00    0.69    2992       11 K    0.61    0.76    0.00    0.00      840     53
  15    1     0.00   0.60   0.00    0.80    3374       11 K    0.59    0.76    0.00    0.00      336     48
  16    0     0.06   1.82   0.03    0.96      24 K     77 K    0.66    0.71    0.00    0.00       56     54
  17    1     0.05   1.50   0.03    0.86      34 K     85 K    0.55    0.73    0.00    0.00    12208     47
  18    0     0.03   1.38   0.02    0.85      24 K     55 K    0.49    0.73    0.00    0.00      504     54
  19    1     0.00   0.50   0.00    0.63     848     4240      0.73    0.79    0.00    0.00      672     49
  20    0     0.07   1.57   0.05    0.94      41 K    125 K    0.63    0.69    0.00    0.00       56     53
  21    1     0.01   0.84   0.01    0.69      10 K     31 K    0.55    0.80    0.00    0.00      672     50
  22    0     0.06   1.28   0.05    0.91     107 K    190 K    0.39    0.66    0.00    0.00     4872     53
  23    1     0.01   0.80   0.01    0.68    9274       25 K    0.50    0.81    0.00    0.00     1064     47
  24    0     0.01   0.67   0.01    0.73      11 K     30 K    0.52    0.80    0.00    0.00      672     53
  25    1     0.01   1.05   0.01    0.67    7555       24 K    0.63    0.78    0.00    0.00      504     46
  26    0     0.00   0.78   0.00    0.74    5403       19 K    0.62    0.76    0.00    0.00     1344     51
  27    1     0.00   0.32   0.00    0.59    1773     7152      0.67    0.69    0.00    0.00      168     48
---------------------------------------------------------------------------------------------------------------
 SKT    0     0.02   1.31   0.01    0.87     285 K    632 K    0.48    0.71    0.00    0.00    12040     50
 SKT    1     0.01   1.02   0.01    0.75     108 K    308 K    0.55    0.77    0.00    0.00    18536     45
---------------------------------------------------------------------------------------------------------------
 TOTAL  *     0.01   1.21   0.01    0.83     393 K    941 K    0.50    0.74    0.00    0.00     N/A      N/A

 Instructions retired:  913 M ; Active cycles:  752 M ; Time (TSC): 2597 Mticks ; C0 (active,non-halted) core residency: 1.25 %

 C1 core residency: 98.75 %; C6 core residency: 0.00 %;
 C0 package residency: 100.00 %; C2 package residency: 0.00 %; C6 package residency: 0.00 %;
                             ┌────────────────────────────────────────────────────────────────────────────────┐
 Core    C-state distribution│01111111111111111111111111111111111111111111111111111111111111111111111111111111│
                             └────────────────────────────────────────────────────────────────────────────────┘
                             ┌────────────────────────────────────────────────────────────────────────────────┐
 Package C-state distribution│00000000000000000000000000000000000000000000000000000000000000000000000000000000│
                             └────────────────────────────────────────────────────────────────────────────────┘

 PHYSICAL CORE IPC                 : 1.21 => corresponds to 30.33 % utilization for cores in active state
 Instructions per nominal CPU cycle: 0.01 => corresponds to 0.31 % core utilization over time interval
 SMI count: 0

Intel(r) UPI data traffic estimation in bytes (data traffic coming to CPU/socket through UPI links):

               UPI0     UPI1     UPI2    |  UPI0   UPI1   UPI2
---------------------------------------------------------------------------------------------------------------
 SKT    0       64 M     64 M      0     |    0%     0%     0%
 SKT    1       43 M     43 M      0     |    0%     0%     0%
---------------------------------------------------------------------------------------------------------------
Total UPI incoming data traffic:  215 M     UPI data traffic/Memory controller traffic: 0.43

Intel(r) UPI traffic estimation in bytes (data and non-data traffic outgoing from CPU/socket through UPI links):

               UPI0     UPI1     UPI2    |  UPI0   UPI1   UPI2
---------------------------------------------------------------------------------------------------------------
 SKT    0      128 M    127 M      0     |    0%     0%     0%
 SKT    1      145 M    147 M      0     |    0%     0%     0%
---------------------------------------------------------------------------------------------------------------
Total UPI outgoing data and non-data traffic:  550 M
MEM (GB)->|  READ |  WRITE | LOCAL | CPU energy | DIMM energy | UncFREQ (Ghz)
---------------------------------------------------------------------------------------------------------------
 SKT   0     0.13     0.09   63 %      49.26      20.29          2.40
 SKT   1     0.16     0.12   35 %      47.37      20.73          2.40
---------------------------------------------------------------------------------------------------------------
       *     0.29     0.21   48 %      96.63      41.03          2.40
Cleaning up
 Closed perf event handles
 Zeroed uncore PMU registers
 Freeing up all RMIDs

curl --silent http://localhost:9738/metrics | grep Memory_Bandwidth

 #curl --silent http://localhost:9738/metrics | grep Memory_Bandwidth
Local_Memory_Bandwidth{socket="0",core="0",thread="0",source="core"} 0
Remote_Memory_Bandwidth{socket="0",core="0",thread="0",source="core"} 0
Local_Memory_Bandwidth{socket="0",core="6",thread="0",source="core"} 0
Remote_Memory_Bandwidth{socket="0",core="6",thread="0",source="core"} 0
Local_Memory_Bandwidth{socket="0",core="1",thread="0",source="core"} 0
Remote_Memory_Bandwidth{socket="0",core="1",thread="0",source="core"} 0
Local_Memory_Bandwidth{socket="0",core="5",thread="0",source="core"} 0
Remote_Memory_Bandwidth{socket="0",core="5",thread="0",source="core"} 0
Local_Memory_Bandwidth{socket="0",core="2",thread="0",source="core"} 0
Remote_Memory_Bandwidth{socket="0",core="2",thread="0",source="core"} 0
Local_Memory_Bandwidth{socket="0",core="4",thread="0",source="core"} 0
Remote_Memory_Bandwidth{socket="0",core="4",thread="0",source="core"} 0
Local_Memory_Bandwidth{socket="0",core="3",thread="0",source="core"} 0
Remote_Memory_Bandwidth{socket="0",core="3",thread="0",source="core"} 0
Local_Memory_Bandwidth{socket="0",core="14",thread="0",source="core"} 0
Remote_Memory_Bandwidth{socket="0",core="14",thread="0",source="core"} 0
Local_Memory_Bandwidth{socket="0",core="8",thread="0",source="core"} 0
Remote_Memory_Bandwidth{socket="0",core="8",thread="0",source="core"} 0
Local_Memory_Bandwidth{socket="0",core="13",thread="0",source="core"} 0
Remote_Memory_Bandwidth{socket="0",core="13",thread="0",source="core"} 0
Local_Memory_Bandwidth{socket="0",core="9",thread="0",source="core"} 0
Remote_Memory_Bandwidth{socket="0",core="9",thread="0",source="core"} 0
Local_Memory_Bandwidth{socket="0",core="12",thread="0",source="core"} 0
Remote_Memory_Bandwidth{socket="0",core="12",thread="0",source="core"} 0
Local_Memory_Bandwidth{socket="0",core="10",thread="0",source="core"} 0
Remote_Memory_Bandwidth{socket="0",core="10",thread="0",source="core"} 0
Local_Memory_Bandwidth{socket="0",core="11",thread="0",source="core"} 0
Remote_Memory_Bandwidth{socket="0",core="11",thread="0",source="core"} 0
Local_Memory_Bandwidth{socket="0",aggregate="socket",source="core"} 0
Remote_Memory_Bandwidth{socket="0",aggregate="socket",source="core"} 0
Local_Memory_Bandwidth{socket="1",core="0",thread="0",source="core"} 0
Remote_Memory_Bandwidth{socket="1",core="0",thread="0",source="core"} 0
Local_Memory_Bandwidth{socket="1",core="6",thread="0",source="core"} 0
Remote_Memory_Bandwidth{socket="1",core="6",thread="0",source="core"} 0
Local_Memory_Bandwidth{socket="1",core="1",thread="0",source="core"} 0
Remote_Memory_Bandwidth{socket="1",core="1",thread="0",source="core"} 0
Local_Memory_Bandwidth{socket="1",core="5",thread="0",source="core"} 0
Remote_Memory_Bandwidth{socket="1",core="5",thread="0",source="core"} 0
Local_Memory_Bandwidth{socket="1",core="2",thread="0",source="core"} 0
Remote_Memory_Bandwidth{socket="1",core="2",thread="0",source="core"} 0
Local_Memory_Bandwidth{socket="1",core="4",thread="0",source="core"} 0
Remote_Memory_Bandwidth{socket="1",core="4",thread="0",source="core"} 0
Local_Memory_Bandwidth{socket="1",core="3",thread="0",source="core"} 0
Remote_Memory_Bandwidth{socket="1",core="3",thread="0",source="core"} 0
Local_Memory_Bandwidth{socket="1",core="14",thread="0",source="core"} 0
Remote_Memory_Bandwidth{socket="1",core="14",thread="0",source="core"} 0
Local_Memory_Bandwidth{socket="1",core="8",thread="0",source="core"} 0
Remote_Memory_Bandwidth{socket="1",core="8",thread="0",source="core"} 0
Local_Memory_Bandwidth{socket="1",core="13",thread="0",source="core"} 0
Remote_Memory_Bandwidth{socket="1",core="13",thread="0",source="core"} 0
Local_Memory_Bandwidth{socket="1",core="9",thread="0",source="core"} 0
Remote_Memory_Bandwidth{socket="1",core="9",thread="0",source="core"} 0
Local_Memory_Bandwidth{socket="1",core="12",thread="0",source="core"} 0
Remote_Memory_Bandwidth{socket="1",core="12",thread="0",source="core"} 0
Local_Memory_Bandwidth{socket="1",core="10",thread="0",source="core"} 0
Remote_Memory_Bandwidth{socket="1",core="10",thread="0",source="core"} 0
Local_Memory_Bandwidth{socket="1",core="11",thread="0",source="core"} 0
Remote_Memory_Bandwidth{socket="1",core="11",thread="0",source="core"} 0
Local_Memory_Bandwidth{socket="1",aggregate="socket",source="core"} 0
Remote_Memory_Bandwidth{socket="1",aggregate="socket",source="core"} 0
Local_Memory_Bandwidth{aggregate="system",source="core"} 0
Remote_Memory_Bandwidth{aggregate="system",source="core"} 0

@rdementi
Copy link
Contributor

rdementi commented Aug 8, 2024

Processor Counter Monitor (202201-1)

it seems you are using the old version. Could you please run the latest version (master branch) and set the new PCM_ENFORCE_MBM=1 environment variable?

@bellalzohir
Copy link
Author

Thank you so much the latest version works perfectly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants