-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorrect CPU Usage Metrics #3228
Comments
Ok, this was a weird one. https://github.com/hashicorp/nomad/blob/master/client/driver/docker.go#L1660 uses the length of the per-cpu usage array to calculate the number of cores. However, the docker stats API is not returning the expected 2 cores, it is returning 15 (see output below). Not sure if this is related to the underlying hardware (I tested it on a few random t2, c4, r4, r3, m4 instances and they all had 15 cores in the output). This causes the percentage calculations to be off, in my case w/ the r4.large, by 7.5x. The docker stats output has an "online_cpus" field, but since that was not exposed in the docker client lib, I went ahead and changed it to use the runtime.NumCPU() value. This fixed all alloc stats, everything is exactly where it should be (3000/3000Mhz for the alloc, 3000/6000Mhz for the host, etc). I opened #3229. I checked back in the history and the calculations were added in 693c8f9 so I am not sure if there was a historical reason for it working the way it did.
|
@epipho Awesome write up and thanks for the PR 👍 |
I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues. |
Nomad version
Output from
nomad version
Nomad v0.6.3
Operating system and Environment details
AWS r4.large instance (2 cores, fingerprint as 3000Mhz each)
CoreOS Container Linux 1465
Docker 17.05.0-ce
Issue
Allocation Resource Utilization for allocations is incorrect, reporting using way more CPU than is available on the host.
Both nomad CLI and the API agree on the (incorrect) number,
Reproduction steps
nomad node-status -self
to view resource allocation/utilizationOutput from nomad node-status -self
Truncated to resources section. Host Utilization seems fine, Allocation utilization is way off.
Output from nomad-alloc-status
Truncated just to resources usage.
Output from docker stats
Output from v1/client/allocation//stats
Job file (if appropriate)
The text was updated successfully, but these errors were encountered: