Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker Driver Fails With Upper Limit of 262144 CPU Shares #7731

Open
herter4171 opened this issue Apr 15, 2020 · 6 comments
Open

Docker Driver Fails With Upper Limit of 262144 CPU Shares #7731

herter4171 opened this issue Apr 15, 2020 · 6 comments
Labels
stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/client theme/driver/docker type/bug

Comments

@herter4171
Copy link

Nomad version

Nomad v0.11.0 (5f8fe0a)

Operating system and Environment details

Amazon Linux 2 with a fixed head node and an auto-scaling group, with scaling driven by Nomad state using a custom cloud metric.

Issue

This came up in the course of troubleshooting issue #7681, and while my intent isn't to issue-spam you guys, I think this is a separate problem that is actively holding back some of my work, unlike the former.

Anyway, I'm experiencing a Docker Driver failure due to an apparent upper limit on CPU shares. I have tested this on c5.18xlarge instances with the following result.

I have also tested this on m5a.24xlarge instances with the following identical result.
image

I can't even find process_linux.go in the source, so I'm really at a loss here. Any help is greatly appreciated.

Reproduction steps

Submit a job that has more than 262144 CPU shares allocated on a large enough instance to have the job placed, and the Docker driver should fail in the manner I've described.

@shishir-a412ed
Copy link
Contributor

@herter4171 You won't find container_linux.go or process_linux.go in nomad or docker codebase. The error is coming from container runtime (runc).

https://github.com/opencontainers/runc/blob/master/libcontainer/container_linux.go#L349

@herter4171
Copy link
Author

@shishir-a412ed, thanks for pointing me in the right direction. 262144 is all over the place there.

@tgross
Copy link
Member

tgross commented Apr 17, 2020

@herter4171 just a heads up, that value is the maximum cpu_share parameter value from the Linux kernel. I'm not sure if there's a tunable for that .

@herter4171
Copy link
Author

Hey @tgross, thanks for the heads-up. This was the first time that troubleshooting led me to a Torvalds repo, and I knew to abandon all hope without being well-versed in operating systems. Hats off to you and your team for the CPU burst capability. Running with 262144 shares on a c5.24xlarge takes up 90% of the capacity, so we'll be just fine, even if another job allocates the remainder from time to time.

@schmichael
Copy link
Member

schmichael commented Feb 5, 2021

Reopening as the root bug here is Nomad's 1:1 mapping of mhz to shares. I think Nomad can even change that in a way that fixes this bug and preserves backward compatible behavior. A 10:1 or 128:1 or similar mapping should preserve the relative cpu share weights while keeping within the valid value range.

This problem is going to get more common as high-core-count machines are used more.

@schmichael schmichael reopened this Feb 5, 2021
@schmichael schmichael added stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/client theme/driver/docker type/bug labels Feb 5, 2021
@flyinprogrammer
Copy link
Contributor

As brought up previously: #4899 (comment)

https://bugs.openjdk.java.net/browse/JDK-8146115

If cpu_shares has been setup for the container, the number_of_cpus() will be calculated based on cpu_shares()/1024. 1024 is the default and standard unit for calculating relative cpu usage in cloud based container management software.

By this logic it seems the openjdk community believes there will never be more than 256 (262144/1024) cores in a machine, or that they're willing to propose a kernel patch when the time comes 😂

"10:1 or 128:1 or similar mapping"

I'm worried that your proposed fix is going to just add another layer of broken to this lasagna of madness.

It seems to me the better solution would be to go all in on how cpu-shares are relative to other processes running on the machine in the context of the magic_number 1024, or go all in on cfs quotas like k8s has.

As far as backwards compatibility is concerned, why not just implement new resource constraint? cpu-shares which enables us to get this issue correct, similar to how cpu-cores as proposed in #8473

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/client theme/driver/docker type/bug
Projects
None yet
Development

No branches or pull requests

5 participants