Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use dmidecode as fallback source of cpu_total_compute #4233

Closed
schmichael opened this issue Apr 30, 2018 · 19 comments
Closed

Use dmidecode as fallback source of cpu_total_compute #4233

schmichael opened this issue Apr 30, 2018 · 19 comments

Comments

@schmichael
Copy link
Member

ARM chipsets sparsely populate /proc/cpuinfo and often cause cpu_total_compute fingerprinting to fail.

dmidecode is a viable fallback when /proc/cpuinfo does not contain the necessary information:

$ dmidecode -t 4

# dmidecode 3.0
Getting SMBIOS data from sysfs.
SMBIOS 3.0.0 present.

Handle 0x0400, DMI type 4, 42 bytes
Processor Information
	Socket Designation: CPU 0
	Type: Central Processor
	Family: Other
	Manufacturer: QEMU
	ID: 00 00 00 00 00 00 00 00
	Version: 1.0
	Voltage: Unknown
	External Clock: Unknown
	Max Speed: 2000 MHz
	Current Speed: 2000 MHz
	Status: Populated, Enabled
	Upgrade: Other
	L1 Cache Handle: Not Provided
	L2 Cache Handle: Not Provided
	L3 Cache Handle: Not Provided
	Serial Number: Not Specified
	Asset Tag: Not Specified
	Part Number: Not Specified
	Core Count: 1
	Core Enabled: 1
	Thread Count: 1
	Characteristics: None

...

See #2638 (comment) for details and thanks to @balupton for the suggestion!

@Legogris
Copy link

Legogris commented Feb 7, 2020

Is anyone working on this?

@angrycub
Copy link
Contributor

angrycub commented Feb 7, 2020

Note that you can work around this issue by using the cpu_total_compute configuration for compute elements that miscalculate. This will override the fingerprinter in cases where it can't calculate properly.

@shantanugadgil
Copy link
Contributor

Has this made it into 1.0.0 GA ? I assume not, as I am hitting this on an aarch64 VM of CentOS 7.
(setting the cpu_total_compute value works, but 😞 )

@tgross
Copy link
Member

tgross commented Dec 14, 2020

Hi @shantanugadgil. No this didn't land in 1.0

@roylez
Copy link

roylez commented Jun 21, 2021

Another problem is that for arm cpu usage metrics simply does not work. Container cpu usage remains 0 MHz all the time, making setting cpu_total_compute useless as scheduler always gets zero.

@courtland
Copy link

FWIW I'm also running into this issue on AWS Graviton 3 Nitro instances (aarch64), so I'm also falling back to manually setting cpu_total_compute based on dmidecode. But as @roylez mentioned, CPU usage on the client always reports 0. Kind of a bummer after putting a lot of effort into making our workloads arm friendly :(

I'm thinking about taking a shot at a PR for this unless anyone else is already working on this? Any suggestions on most reliable source for current CPU freq on arm?

Nomad v1.4.3

@tgross
Copy link
Member

tgross commented Mar 2, 2023

Hi @courtland! We'd love to review a PR for this. Our code is in helper/stats/cpu.go but the real work to be done here may actually be in github.com/shirou/gopsutil/v3/cpu, which we use as the library to read CPU info. As @schmichael noted above, dmidecode seems to be the reasonable fallback.

@courtland
Copy link

@tgross thanks for the hints!

Is there a preference to adding a new golang package for dmidecode parsing vs. exec within the client and doing it "manually"? Is there some other area where the nomad client optionally relies on a userland binary to exist? I'm assuming we would have to recommend in the docs that anyone running aarch64 should install dmidecode separately.

@tgross
Copy link
Member

tgross commented Mar 9, 2023

Is there a preference to adding a new golang package for dmidecode parsing vs. exec within the client and doing it "manually"?

I'd order our preference for a solution as follows:

  1. Add the support for reading the required values out of /dev/mem to shirou/gopsutil (but I also recognize that's a heck of a lift 😀 )
  2. Add a call to the dmidecode binary to shirou/gopsutil
  3. Add a call to the dmidecode binary to Nomad

Is there some other area where the nomad client optionally relies on a userland binary to exist? I'm assuming we would have to recommend in the docs that anyone running aarch64 should install dmidecode separately.

A couple bits of the Nomad client have (undocumented 😬) dependencies on coreutils binaries on Unixish hosts, but otherwise I think only CNI requires one. So long as we document the requirement and have a safe fallback if it's not installed, I think we'll be ok.

@schmichael
Copy link
Member Author

+1 to Tim's list (although I think he meant /dev/cpuinfo and not /dev/mem... parsing /dev/mem would be very exciting), but just to throw out another option that might compose well with other fallbacks:

I think a lookup table might be a reasonable approach as well as that avoids the problem of having to find the max frequency supported and not whatever frequency the chip is currently at as part of power/thermal management. The big downside of lookup tables is that they're impossible to test without access to that hardware, so we'd have to rely on contributions.

For example we have a big AWS EC2 lookup table here: https://github.com/hashicorp/nomad/blob/main/client/fingerprint/env_aws_cpu.go

Generated by make ec2info and backported.

@courtland
Copy link

Hah, yeah, gotta be in /dev/mem somewhere, right, maybe... I'm assuming you both meant /proc/cpuinfo ? Unfortunately, at least on my m7g, it just has BogoMIPS : 2100.00 with some other cpu features. The actual max speed is 2600MHz according to dmidecode.

The lookup table approach is alright to get the max freq, especially since you're already doing that. Actually, in my case, it's simply just missing the latest M7g instance types. I think you're saying that requires someone to go and run make ec2info manually and merge the result?

I am successfully using dmidecode to know the max frequency. Personally I'd prefer if nomad worked on any arm64 system. The python and go psutil libraries both fail at detecting CPU speed in my case. I'm actually using the python version that gopsutil is based on - same thing.

I think option 2 suggested by Tim is the right solution - add dmidecode calls to gopsutil. Unless there's something cool I'm not understanding about /dev/mem.

The remaining problem for me, that I'd like to address, is that the nomad client reports 0 MHz utilization all the time. I'm not sure if that's a podman driver problem or an issue with the nomad client? All my jobs are podman driver based.

@courtland
Copy link

Looks like this has been discussed on and off for a while in gopsutil and here...

@schmichael had proposed the change to gopsutil back in 2017 :D
shirou/gopsutil#282

@shoenig seems to agree it's not worth nomad supporting arm64 detection in this duplicate issue:
#14055

I wouldn't mind trying to update the lookup table and adding some dmidecode support to gopsutil if that seems reasonable.

@tgross
Copy link
Member

tgross commented Mar 9, 2023

Surprisingly, I really did mean reading /dev/mem! Because as far as I can tell that's actually where dmidecode is reading from. It even has an arg to use a different path for that file. (ref man(8) dmidecode). But to do that you're definitely getting into the deep magic bits 😀

I think option 2 suggested by Tim is the right solution - add dmidecode calls to gopsutil

That seems totally reasonable.

The remaining problem for me, that I'd like to address, is that the nomad client reports 0 MHz utilization all the time. I'm not sure if that's a podman driver problem or an issue with the nomad client? All my jobs are podman driver based.

The client gets stats from the driver via the TaskStats API, so that's most likely an issue with the driver (or podman itself, as that's where the driver probably gets stats!). Would you be up for opening an issue in the podman repo?

@courtland
Copy link

Surprisingly, I really did mean reading /dev/mem! Because as far as I can tell that's actually where dmidecode is reading from. It even has an arg to use a different path for that file. (ref man(8) dmidecode). But to do that you're definitely getting into the deep magic bits 😀

Interesting! I will take a look and see if I have enough magic bits leftover...

I think option 2 suggested by Tim is the right solution - add dmidecode calls to gopsutil

That seems totally reasonable.

The remaining problem for me, that I'd like to address, is that the nomad client reports 0 MHz utilization all the time. I'm not sure if that's a podman driver problem or an issue with the nomad client? All my jobs are podman driver based.

The client gets stats from the driver via the TaskStats API, so that's most likely an issue with the driver (or podman itself, as that's where the driver probably gets stats!). Would you be up for opening an issue in the podman repo?

Thanks for the insight and direction - I created a new issue.

@courtland
Copy link

It seems like someone over at digital ocean tried get SMBIOS info out of /dev/mem in native golang.

https://blog.gopheracademy.com/advent-2017/accessing-smbios-information-with-go/

The resulting package is old but appears to do some of the heavy lifting.
https://github.com/digitalocean/go-smbios/blob/master/smbios/decoder.go

Not sure if there is appetite for hashi to keep it alive?

@shoenig
Copy link
Member

shoenig commented Mar 9, 2023

The remaining problem for me, that I'd like to address, is that the nomad client reports 0 MHz utilization all the time. I'm not sure if that's a podman driver problem or an issue with the nomad client?

This isn't specific to podman or nomad, it's that the information is not reported by the ARM kernel driver - you need this patch, or one like it. Maybe things have changed recently - in which case we should get the gopsutil library updated.

https://patchwork.kernel.org/project/linux-arm-kernel/patch/[email protected]/

@schmichael
Copy link
Member Author

If it's an AWS Graviton instance then #16417 should pick it up.

I think I'd rather shell out to a known quantity like dmidecode rather than parse /dev/mem ourselves, but clearly I don't know much about the implementation details of that!

@tgross
Copy link
Member

tgross commented Mar 10, 2023

I think I'd rather shell out to a known quantity like dmidecode rather than parse /dev/mem ourselves, but clearly I don't know much about the implementation details of that!

That's a good point -- the maintainers of dmidecode are going to stay on top of any changes to the layout of that data way more readily than the gopsutil project will be able to.

@lgfa29
Copy link
Contributor

lgfa29 commented Aug 24, 2023

dmidecode fingerprinting was implemented in #18146, which will ship with Nomad 1.7.0.

@lgfa29 lgfa29 closed this as completed Aug 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants