-
Notifications
You must be signed in to change notification settings - Fork 479
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PCM Does Not Display LMB and RMB Metrics in Prometheus Integration #761
Comments
Hi, thanks for creating the issue. Could you please also share the output of |
Hi, thank you for getting back to me so quickly.
pqos monitoring VS pqos-msr monitoring
Questions:
|
Thanks, that is helpful. On your CPU we are disabling reading these RDT counters from HW due to errata. Linux kernel does the same: torvalds/linux@d56593e |
Thank you for the quick response and for clarifying the situation with the RDT counters on my CPU. I appreciate the suggestion to add an option in PCM to re-enable these metrics. To provide further context, I have already enabled RDT features from the boot configuration to ensure that all requisite counters are available at the OS level. My current GRUB configuration is set as follows:
This configuration was intended to ensure that the RDT options are fully enabled within the Linux kernel. Given this setup, I would be interested in understanding if there might be additional steps or configurations within PCM that need to be aligned with this kernel setting to effectively monitor these metrics. Looking forward to your guidance on how we might proceed to achieve comprehensive monitoring capabilities. |
the change has been implemented. Set this environment variable: |
hank you for your previous response and the guidance provided. I have successfully mounted the resctrl filesystem and set the environment variable
Error Messages:
Despite the filesystem already being mounted, I faced challenges when trying to adjust the mount settings to enable different features.
Mount Attempt2:
Your insights or any further guidance on how to resolve these file access and mounting issues would be greatly appreciated. |
there seems to be an issue with the Linux RDT driver (config). Could you try unmounting resctrl and set this env variable in PCM:
Then PCM will access RDT directly. |
Hello, I'm experiencing an issue where Local and Remote Memory Bandwidth (LMB and RMB) metrics are not displayed in Prometheus, despite proper configuration and troubleshooting steps taken. Steps and observations:
I would appreciate any insights or potential solutions Thank you for your assistance. |
sorry for the delay (I was out of office). Could you please share the output of ./pcm-sensor-server in this scenario?
this is expected. You can't run pcm and pqos to monitor the RDT metrics because they both try to program/use them exclusively. |
could you please also share the complete output of "pcm -r -i=1" main utility (run exclusively to pcm-sensor-server or pqos)? |
and also the output of "curl --silent http://localhost:9738/metrics | grep Memory_Bandwidth" when pcm-sensor-server is run exclusively? |
Thank you for your response here is the output of the following command lines
|
it seems you are using the old version. Could you please run the latest version (master branch) and set the new PCM_ENFORCE_MBM=1 environment variable? |
Thank you so much the latest version works perfectly. |
Hello Intel PCM and PQoS developers,
I am facing an issue with Intel PCM where it fails to report local and remote memory bandwidth (LMB and RMB) metrics when monitored through Prometheus, despite these metrics being available and correctly reported when using pqos-msr.
Environment:
OS: Linux kernel 5.15.0-112-generic
Configuration:
RDT features enabled (rdt=cmt, mbmtotal, mbmlocal, l3cat, l3cdp, l2cat, l2cdp, mba)
through GRUB configuration:GRUB_CMDLINE_LINUX="hugepagesz=1G hugepages=12 rdt=cmt, mbmtotal, mbmlocal, l3cat, l3cdp, l2cat, l2cdp, mba"
CONFIG_X86_CPU_RESCTRL=y
in kernel configurationIssue:
When monitoring server performance metrics using Intel PCM with Prometheus, the local and remote memory bandwidth metrics (LMB and RMB) consistently report as 0, indicating no data is being captured or transmitted. However, using pqos-msr, these metrics are clearly available and accurately reported.
Execute sudo pqos-msr -d and note the additional memory bandwidth metrics being monitored.
Expected Behavior:
Intel PCM should accurately capture and export all available memory bandwidth metrics, including LMB and RMB, to Prometheus.
Actual Behavior:
LMB and RMB metrics appear as 0 in Prometheus, suggesting an issue with either the PCM data capture or the export process.
I appreciate any assistance or guidance you can provide and am available for further testing or to provide additional information as needed.
The text was updated successfully, but these errors were encountered: