Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CPU utilization not reported on aarch64 #217

Closed
courtland opened this issue Mar 9, 2023 · 9 comments
Closed

CPU utilization not reported on aarch64 #217

courtland opened this issue Mar 9, 2023 · 9 comments

Comments

@courtland
Copy link
Contributor

Under AWS Graviton aarch64/arm64 instances, the CPU utilization reported by the nomad client is 0.

My understanding is this is a function of the driver (podman in my case) and not the nomad client. I briefly tried to track down where this could be breaking, but the //FIXME implement cpu stats correctly in runStatsEmitter made me wonder if somehow that's related?

podman stats shows container usage correctly.

Ubuntu 22.04.2 LTS // podman version 3.4.4

This is somewhat related to hashicorp/nomad#4233

@towe75
Copy link
Collaborator

towe75 commented Mar 10, 2023

hey @courtland , thank you for reaching out.
Just to verify: does podman (without nomad...) show the stats?

podman stats

@lgfa29
Copy link
Contributor

lgfa29 commented Mar 10, 2023

Hi @courtland 👋

Could you also check if you're running cgroups v2 and if switching to v1 fixes the problem? This may be related to #160.

@courtland
Copy link
Contributor Author

Yes, podman stats correctly shows CPU usage.

I am indeed running cgroups v2 (ubuntu jammy/22.04), so perhaps that is likely the culprit. I will work on testing an instance with v1.

@courtland
Copy link
Contributor Author

Switching to cgroups v1 does NOT fix the problem. It's also worth noting that client CPU utilization reports correctly on my x86_64/amd64 instances (everything else is the same except arch).

@lgfa29
Copy link
Contributor

lgfa29 commented Mar 14, 2023

Thanks for testing, it seems like we will need to investigate this further 👍

@Procsiab
Copy link
Contributor

Procsiab commented Oct 4, 2023

Hello there, I was interested in helping investigate this issue since I am also using Nomad to orchestrate container workloads on ARM 64 bit based nodes: I am not able to reproduce the absence of CPU statistics, so maybe I have done something different in my setup with respect to @courtland ... Maybe it's just the version of Podman or the permission on the cgroup slice folder; i'll attach below some pictures taken from Nomad's UI, about an aarch64 node I am using (it's a Raspberry Pi 3B)

Screenshot from 2023-10-04 19-27-08

Screenshot from 2023-10-04 19-27-33

@lgfa29
Copy link
Contributor

lgfa29 commented Nov 25, 2023

Thank you for the extra info @Procsiab.

CPU fingerprinting and stat is something we've been fixing in Nomad (specially on ARM) for the past few releases, so we may have fixed this at some point. Which version of Nomad are you using?

@courtland by any chance would you be able to check if this is still a problem?

Thanks!

@Procsiab
Copy link
Contributor

In reply to @lgfa29 and to integrate my previous post: at the time of writing it I was using Nomad 1.6.2.
I am now using Podman 4.7.2 and Nomad 1.6.3 on the same ARM hardware and I am still not experiencing the issue we are discussing here.

@courtland
Copy link
Contributor Author

Thank you for the extra info @Procsiab.

CPU fingerprinting and stat is something we've been fixing in Nomad (specially on ARM) for the past few releases, so we may have fixed this at some point. Which version of Nomad are you using?

@courtland by any chance would you be able to check if this is still a problem?

Thanks!

Yes, Nomad 1.6.x seems to have resolved this problem. Closing this issue. Thanks for following up and looking into it @Procsiab

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants