-
Notifications
You must be signed in to change notification settings - Fork 18.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When --cpuset-cpus argument is used, processes inspecting CPU configuration in the container see all cores #20770
Comments
On the surface this issue looks to be similar to what's described as Ubuntu bug ID 1435571, though I can see how this behaviour might manifest from some other root cause. However in this case it may have been a kernel bug, as they've fixed it with these two kernel patches. Knowing very little about cgroups myself, I'd also wonder if CentOS7 issue 9078 isn't related. Either way, I raised the issue here on the chance that either this is an issue specific to docker and not the host OS, or that docker would be improved by including a workaround to this issue. |
@benjamincburns can you try running the |
Thanks @thaJeztah. Before seeing your comment I fired up a fresh install of CentOS 7 and made sure it was up to date. I then installed docker according to the official installation instructions. This issue does not occur in that configuration. I will run this the check-config script in both locations and compare the output. If it turns out that this was an issue with this feature not being supported by the kernel, I'd suggest that this script be converted into runtime checks within docker itself so that the docker CLI can fail with an appropriate error message when trying to create a container which would use kernel features that aren't supported. |
I have run the Their diff:
Note of course that the last line is not a deletion, but the hyphen is part of the script output. I'll see if I can't review patches which have been applied between 3.10.0-229.14.1 and 3.10.0-327.10.1. |
Actually, I think the patch review is unnecessary, as this issue occurs on a different docker host in our prod environment which is already running 3.10.0-327.10.1, and the latest userspace, CentOS 7.2.1511. To avoid (or inadvertently create) confusion, I refer to this host as Copy & pasted repro output, modified slightly to change hostname:
The output of This also suggests that the exact CentOS version may also not matter much, as both my test VM and Just for completeness, below you will find the same info requested in the issue template, but for Output of
Output of
And for good measure, OS release specifics:
|
To see if I could spot a pattern of some sort, I've tested for the presence of this on the 10 docker hosts to which I have access. The only machine on which I have not observed this issue is the clean VM I set up specifically to test this issue. Below are the configurations of the machines in question (hosts discussed above are included). Except for the test VM, which is excluded from the machine counts in the table below, all machines tested are bare metal.
On the off chance that there's some difference in behaviour between |
Argh... forget everything I said about the test VM working correctly. It turns out I'd forgotten that I'd only provisioned one vcpu for the vm. Now that I've switched it to 4 vcpus, the problem occurs there, too. |
I see that the proper value is being set to
And after using
I don't fully understand the patches I linked in my first comment, but I have verified that nothing like them has been applied to the CentOS kernel. In fact, there is no |
So it's looking like To determine this I created two containers, one with Question: Is solving this issue in scope for docker, or is this a kernel-level problem? Console session:
|
From the Ubuntu bug report in my first comment, it looks like docker can work around this issue by creating its cgroup with |
Whoops, didn't mean to close. |
hm, interesting, let me ping @LK4D4 and @anusha-ragunathan, perhaps they have some thoughts on that |
Eh, that might be a red herring. I've tried doing this manually to no effect. Also it appears that |
What do you get inside the container? i.e.
|
That command works correctly, which is good news as for the applications for which we control we can inspect this file. However for applications running in vms like mono, this will present some pain. It'd be much simpler overall if the process didn't need to be aware that it was running within a cgroup. |
To add a bit of supporting info to my last statement, I grepped mono's source quickly and found that on systems with a proper glibc, mono detects the core count via This can be seen in the mono source at
|
This sounds similar to #20688, and a nice article describing the situation http://fabiokung.com/2014/03/13/memory-inside-linux-containers/ |
Yes, it certainly does. Digging into mono source a bit further it's also parsing I'll likely open an issue with mono to make the VM cgroup aware, however I agree with @thechile's last comment on #20688 that the container community ought to be working with kernel maintainers to sort out a solution to this problem. Linus has a pretty famous rule that the kernel shouldn't break userspace. I'd think that the container shouldn't break userspace, either. You might argue that it's not the container, it's cgroups, but if the choice to use cgroups forces containerized processes to become cgroup aware, then from the perspective of the user it's the same result. It's pain enough for native processes where I control thread pooling and resource allocation, but when you've got a full platform stack that you're trying to drop into a container it gets quite expensive quite quick. |
I've raised a mono issue with the hope that they'll pick it up and at least work around this problem. That said, I'd rather not need to also raise issues for go, python, ruby, java, and so on. |
@benjamincburns how did you end up working around this? As of Linux 5.1 this still occurs, which is a real pain when doing CPU pinning; inside the container you can still see all the cpus, but only the ones assigned with |
@qlyoung as far as I can remember, we didn't. |
So what's the situation with this issue? I have some code that is deciding how many processes to fork based on CPU count and it's getting the wrong number of processors. |
@jdmarshall based on some additional research it seems the appropriate fix for this will ultimately be, as with all things, a kernel namespace for whatever this resource class is. If you want to know what CPUs you can actually get, you can loop through each "available" core and try to bind to it with sched_setaffinity. If it works then it's available, if not then it's not available to the container. I did this for AFL, if you want an example, patch is here. So maybe for your case fork off N = # cpus processes, try Brendan Gregg touches on this a bit in this talk https://www.youtube.com/watch?v=bK9A5ODIgac, although it's in the context of |
How JVM handles active processor count may help to you. |
On Intel machines I can get around by reading However, this does not seem to work on Aarch64 linux machines: Any idea why is this discrepancy between Intel and Aarch64? |
I've been using 'nproc' on linux to get better behavior, and 'sysctl -n hw.logicalcpu' on OS X. I found this somewhere on stackoverflow. Since I only really need this data at startup I just eat the child process overhead. I think standard lib writers are getting wise to this though. I think Node introduced a fix for this in the previous major version. |
@thaJeztah how are we feeling about this issue these days? Initially I'd hoped that there would be some way that the docker could be made to work with legacy software that was written prior to cgroups existing, as well as software that was written to erroneously assume that it could make use of all cores on the host. Ultimately it would seem that the path to achieving this goal is rooted in how cgroups restrictions are exposed by the kernel to the user space processes that are subject to those restrictions. As a result of that, I'm not sure that there's anything for the container engine to do here. I'm also no longer sure that the goal as I just stated it is even desirable, let alone achievable. That is, there's a distinct difference between "the set of CPUs available to the host" and "the set of CPUs that a process can access," and that's true in a wide variety of scenarios that have nothing to do with containerization. With that in mind, I think this is a discussion for the kernel mailing lists, if it's even a discussion worth having. Unfortunately I don't really have the time or motivation right now to champion that conversation, but I'd encourage anyone who finds this issue to be important to take it up there. In the meantime, I think it's probably best to close this issue. @thaJeztah if you or any other maintainers feel otherwise, please feel free to reopen. |
PS:
|
Thanks for the additional context, @felipecrs. I wish that Just for clarity, are you advocating for this issue to remain open, in light of the tooling you posted? I just worry that making this behaviour a default in moby could be problematic. For example, I think it's not uncommon for k8s clusters to set affinity for privileged cluster management & host monitoring jobs to a set of reserved CPUs that aren't used for other workloads (guarantees liveness, minimises the impact of monitoring on latency-sensitive workloads, etc). If it were something that wasn't on by default, but could be optionally set on a container-by-container basis, that could still add utility, however. |
I would love for this feature to be baked into docker, rather than having to rely on external tools that are (very) convoluted to setup. Being able to specify it on a container-by-container basis would be the ideal, like Then, making it the default would be a whole different conversation that can start once such feature exist. In my limited, personal gut feeling I believe it would be nicer to be the default behavior. But I do not want to argue about it.
To be honest I'm not advocating for this issue to remain open as I have zero hope that Docker would ever implement it. |
Output of
docker version
:Output of
docker info
:Provide additional environment details (AWS, VirtualBox, physical, etc.):
Physical machine
List the steps to reproduce the issue:
docker run -it --cpuset-cpus=0 centos:centos7
grep processor /proc/cpuinfo | wc -l
Describe the results you received:
Output:
32
Describe the results you expected:
Output:
1
Provide additional info you think is important:
Per the title, it appears that docker 1.10.2 isn't respecting the
--cpuset-cpus
argument. We have a number of containers for applications which use thread pools which are sized based on the number of cores available. Since updating to 1.10.2 (from a various array of versions starting somewhere in 1.3.x), the thread counts on our docker hosts are through the roof. [Edit: this wasn't actually linked to the update, but rather we'd deployed a few new containers which ran on mono at around the same time. This is still an issue, however.]OS version info:
The text was updated successfully, but these errors were encountered: