-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use dmidecode as fallback source of cpu_total_compute #4233
Comments
Is anyone working on this? |
Note that you can work around this issue by using the cpu_total_compute configuration for compute elements that miscalculate. This will override the fingerprinter in cases where it can't calculate properly. |
Has this made it into 1.0.0 GA ? I assume not, as I am hitting this on an |
Hi @shantanugadgil. No this didn't land in 1.0 |
Another problem is that for arm cpu usage metrics simply does not work. Container cpu usage remains 0 MHz all the time, making setting |
FWIW I'm also running into this issue on AWS Graviton 3 Nitro instances (aarch64), so I'm also falling back to manually setting I'm thinking about taking a shot at a PR for this unless anyone else is already working on this? Any suggestions on most reliable source for current CPU freq on arm? Nomad v1.4.3 |
Hi @courtland! We'd love to review a PR for this. Our code is in |
@tgross thanks for the hints! Is there a preference to adding a new golang package for dmidecode parsing vs. exec within the client and doing it "manually"? Is there some other area where the nomad client optionally relies on a userland binary to exist? I'm assuming we would have to recommend in the docs that anyone running aarch64 should install dmidecode separately. |
I'd order our preference for a solution as follows:
A couple bits of the Nomad client have (undocumented 😬) dependencies on coreutils binaries on Unixish hosts, but otherwise I think only CNI requires one. So long as we document the requirement and have a safe fallback if it's not installed, I think we'll be ok. |
+1 to Tim's list (although I think he meant I think a lookup table might be a reasonable approach as well as that avoids the problem of having to find the max frequency supported and not whatever frequency the chip is currently at as part of power/thermal management. The big downside of lookup tables is that they're impossible to test without access to that hardware, so we'd have to rely on contributions. For example we have a big AWS EC2 lookup table here: https://github.com/hashicorp/nomad/blob/main/client/fingerprint/env_aws_cpu.go Generated by |
Hah, yeah, gotta be in The lookup table approach is alright to get the max freq, especially since you're already doing that. Actually, in my case, it's simply just missing the latest I am successfully using I think option 2 suggested by Tim is the right solution - add The remaining problem for me, that I'd like to address, is that the nomad client reports 0 MHz utilization all the time. I'm not sure if that's a podman driver problem or an issue with the nomad client? All my jobs are podman driver based. |
Looks like this has been discussed on and off for a while in gopsutil and here... @schmichael had proposed the change to gopsutil back in 2017 :D @shoenig seems to agree it's not worth nomad supporting arm64 detection in this duplicate issue: I wouldn't mind trying to update the lookup table and adding some dmidecode support to gopsutil if that seems reasonable. |
Surprisingly, I really did mean reading
That seems totally reasonable.
The client gets stats from the driver via the |
Interesting! I will take a look and see if I have enough magic bits leftover...
Thanks for the insight and direction - I created a new issue. |
It seems like someone over at digital ocean tried get SMBIOS info out of https://blog.gopheracademy.com/advent-2017/accessing-smbios-information-with-go/ The resulting package is old but appears to do some of the heavy lifting. Not sure if there is appetite for hashi to keep it alive? |
This isn't specific to podman or nomad, it's that the information is not reported by the ARM kernel driver - you need this patch, or one like it. Maybe things have changed recently - in which case we should get the gopsutil library updated. https://patchwork.kernel.org/project/linux-arm-kernel/patch/[email protected]/ |
If it's an AWS Graviton instance then #16417 should pick it up. I think I'd rather shell out to a known quantity like |
That's a good point -- the maintainers of |
|
ARM chipsets sparsely populate
/proc/cpuinfo
and often causecpu_total_compute
fingerprinting to fail.dmidecode
is a viable fallback when/proc/cpuinfo
does not contain the necessary information:See #2638 (comment) for details and thanks to @balupton for the suggestion!
The text was updated successfully, but these errors were encountered: