-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cgroups misconfiguration #2999
Comments
Please add the actual |
Can we please create minimal configurations for reproducing? Most of the reproducing configuration looks unrelated and probably unnecessary. What is the minimum required to reproduce this? Also yes, we need the rest of |
Hey @stmcginnis @BenTheElder ,
|
I'll try to get back to digging into this, but it's not surprising to me that static CPU allocation doesn't work with nested containers. These cluster nodes do not have exclusive access to the kernel and resource limits are better tested via some other solution (e.g. VMs). |
@BenTheElder This worked fine for us until the change #2737 where 1.24/25 switches to systemd cgroup driver. I was wondering if it would be possible to go back or provide an opt-out option. Note: The problem is devices that are not accessible. I did not check if cpu requests & limits are enforced. We will try to find some time and debug what's going on in detail in the hope of fixing this. |
Now that is surprising 👀
It's possible to override with config patches to containerd and kubeadm/kubelet, however the ecosystem is moving towards cgroups v2 only (not sure when, I expect sometime next year), and in cgroupsv2 I haven't found anyone running CI not using the systemd backend which is generally recommended. If we've regressed vs the cgroups driver, we should fix that. Unfortunately I don't personally have much time at the moment :/ |
Hi @BenTheElder have you had a chance to look into this issue? |
No I have not. Kubernetes Project Infrastructure sustainability and Steering Committee related things have eaten most of my time lately. If and when I do I will comment here. |
I am seeing the same issue:
Here is some more info on my setup: CPUManager is not enabled. For CI with multiple k8s version in kind < 1.24 works fine, 1.24 fails with this error. It seems to affect all devices. We see errors like this for jobs running inside affected pods:
stat looks normal
It doesn't happen immediately. It only appears after around 20minutes after the cluster is started. |
Since kubernetes-sigs/kind#2999 blocks us from updating to new k8s versions using kind, use k3d instead of kind. Signed-off-by: Or Shoval <[email protected]>
Since kubernetes-sigs/kind#2999 blocks us from updating to new k8s versions using kind, use k3d instead of kind. Signed-off-by: Or Shoval <[email protected]>
Since kubernetes-sigs/kind#2999 blocks us from updating to new k8s versions using kind, use k3d instead of kind. Signed-off-by: Or Shoval <[email protected]>
Since kubernetes-sigs/kind#2999 blocks us from updating to new k8s versions using kind, use k3d instead of kind. Signed-off-by: Or Shoval <[email protected]>
Since kubernetes-sigs/kind#2999 blocks us from updating to new k8s versions using kind, use k3d instead of kind. Signed-off-by: Or Shoval <[email protected]>
Since kubernetes-sigs/kind#2999 blocks us from updating to new k8s versions using kind, use k3d instead of kind. Signed-off-by: Or Shoval <[email protected]>
Since kubernetes-sigs/kind#2999 blocks us from updating to new k8s versions using kind, use k3d instead of kind. Signed-off-by: Or Shoval <[email protected]>
Since kubernetes-sigs/kind#2999 blocks us from updating to new k8s versions using kind, use k3d instead of kind. Signed-off-by: Or Shoval <[email protected]>
Since kubernetes-sigs/kind#2999 blocks us from updating to newer k8s versions using kind, we are introducing k3d. Changes: * Support of local multi instances was removed, we are not using it, and it shouldn't affect multi instances on CI once we want to introduce it. * Added gracefully releasing of the SR-IOV nics. It reduces the downtime between cluster-down and cluster-up nicely, as the nics disappear for few minutes otherwise. * Only one PF per node is supported, we don't need more for now. * Use the k3d local registry instead one of our own. * The provider is hardcoded with 1 server (master node) and 2 agents (workers). If we will need other configuration it can be done on follow PR, for now there is no reason to support other config. Signed-off-by: Or Shoval <[email protected]>
Since kubernetes-sigs/kind#2999 blocks us from updating to newer k8s versions using kind, we are introducing k3d. Changes: * Support of local multi instances was removed, we are not using it, and it shouldn't affect multi instances on CI once we want to introduce it. * Added gracefully releasing of the SR-IOV nics. It reduces the downtime between cluster-down and cluster-up nicely, as the nics disappear for few minutes otherwise. * Only one PF per node is supported, we don't need more for now. * Use the k3d local registry instead one of our own. * The provider is hardcoded with 1 server (master node) and 2 agents (workers). If we will need other configuration it can be done on follow PR, for now there is no reason to support other config. Signed-off-by: Or Shoval <[email protected]>
Since kubernetes-sigs/kind#2999 blocks us from updating to newer k8s versions using kind, we are introducing k3d. Changes: * Support of local multi instances was removed, we are not using it, and it shouldn't affect multi instances on CI once we want to introduce it. * Added gracefully releasing of the SR-IOV nics. It reduces the downtime between cluster-down and cluster-up nicely, as the nics disappear for few minutes otherwise. * Only one PF per node is supported, we don't need more for now. * Use the k3d local registry instead one of our own. * The provider is hardcoded with 1 server (master node) and 2 agents (workers). If we will need other configuration it can be done on follow PR, for now there is no reason to support other config. Signed-off-by: Or Shoval <[email protected]>
Do not config cpu manager for vgpu, because kind 1.24+ has this bug for cpu manager: kubernetes-sigs/kind#2999 Since we don't use cpu manager on vgpu lane we can bump to k8s-1.25 and remove cpu manager. Rename lane. Signed-off-by: Or Shoval <[email protected]>
Do not config cpu manager for vgpu, because kind 1.24+ has this bug for cpu manager: kubernetes-sigs/kind#2999 Since we don't use cpu manager on vgpu lane we can bump to k8s-1.25 and remove cpu manager. Rename lane. Signed-off-by: Or Shoval <[email protected]>
* kind: Bump vgpu kind to k8s-1.25 Do not config cpu manager for vgpu, because kind 1.24+ has this bug for cpu manager: kubernetes-sigs/kind#2999 Since we don't use cpu manager on vgpu lane we can bump to k8s-1.25 and remove cpu manager. Rename lane. Signed-off-by: Or Shoval <[email protected]> * kind: Rename functions The functions can add extra mounts / cpu manager to non worker nodes, it depends where it is called. If it is called before the worker snippet in the manifest it will configure it for the control-plane, otherwise for the worker node. Rename it to reflect it. Signed-off-by: Or Shoval <[email protected]> --------- Signed-off-by: Or Shoval <[email protected]>
I ran into the same issue with kind 0.18.0 (which I tried because it was the first kind release compatible with kubernetes 1.26 which has cpumanager as GA), reproducing with the following minimal yaml:
Pretty much any pod I scheduled had issues with permissions on /dev, sometimes /dev/null, sometimes /dev/ptmx. docker info contains:
It's a tricky bug, because it seems like something is getting misconfigured, yet all pods etc. schedule as expected and only fail at runtime so I am not even sure if there is any specific logging to even look for in the first place. |
/dev/null recently had a runc bug iirc. |
My co maintainer Antonio has been out and would usually punt this particular type of issue my way. I've been monitoring https://kubernetes.io/blog/2023/03/10/image-registry-redirect/ and Kubernetes is coming out on the other side now looking a lot more sustainable ... I expect to be out for a breather next week, then a lot of the project including Antonio will be at KubeCon (not me unfortunately), after we're both around I'll be meeting with Antonio to review the backlog. We've worked on some other fixes for KIND since this issue was filed, but things that are more clearly root-caused and in-scope (like the iptables incompatibility issue) and getting those released.
Seems to be the common thread. Kubernetes is not testing this with kind currently, SIG node typically tests this with "real" cloud-based clusters. We'll have to do some digging. I'm not seeing this issue crop up without this configuration so far, so a bit torn between the need to roll forward on what seems to be the supported and tested cgrouups driver going forward, and switching back. Kubernetes CI is moving towards systemd + cgroupv2 going forward and I'm not generally aware of any cgroupv2 CI w/o systemd cgroups. Note: If you're doing configuration patching this advanced, you can patch to disable systemd cgroups in kubelet + containerd in the meantime. |
I got time to look at this and I got the following. First I created the pod and then did In case of kind I see (systemd log) So finally I just tried to update the runc in the old image and it seems to be working. Note: Not confident but from a quick look I would say opencontainers/runc@3b95828 is the culprit. (This also explain the systemd log) So the only question left is if we can update runc for 1.24> ? @BenTheElder |
Can you try the release / images in https://github.com/kubernetes-sigs/kind/releases/tag/v0.18.0? We're on runc 1.1.5 in the latest KIND release, which appears to contain opencontainers/runc@3b95828 |
Yes, that works just fine. Thank you. (Note for me, with new release there are new images) |
Excellent! @orelmisan @Belpaire @smlx can you confirm if the latest release resolves this for you as well? |
I'm attempting to minimally reproduce this being broken on v0.17 and confirm the runc upgrade solution in v0.18 without success so far: I'm running this: kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
kubeadmConfigPatches:
- |-
kind: InitConfiguration
nodeRegistration:
kubeletExtraArgs:
"feature-gates": "CPUManager=true"
"cpu-manager-policy": "static"
"kube-reserved": "cpu=500m"
"system-reserved": "cpu=500m" kind create cluster --config=$HOME/kind-test.yaml
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
name: qos-demo
spec:
containers:
- name: qos-demo-ctr
image: nginx
resources:
limits:
memory: "200Mi"
cpu: "700m"
requests:
memory: "200Mi"
cpu: "700m"
EOF
kubectl exec -it qos-demo -- bash Which works fine. |
OK you have to leave it running for a bit, I see this on the above configured v0.17 cluster now, after trying again and waiting a few minutes before exec-ing again:
|
Whereas the same configuration on v0.18 does not have this even after a few minutes. |
On v0.18 with @Belpaire's config from #2999 (comment), but brought down to a single node: kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
kubeadmConfigPatches:
- |-
kind: KubeletConfiguration
cpuManagerPolicy: "static"
reservedSystemCPUs: "0,1,2,3,4,5,6" And using the test snippet above #2999 (comment) I'm not seeing the issue. @Belpaire You mention:
But we have 1.26 in https://github.com/kubernetes-sigs/kind/releases/tag/v0.17.0#new-features Is there any chance you were using v0.17 and the 1.26.0 image? So far I can't reproduce, but v0.17 definitely has the issue described in #2999 (comment), which appears to be fixed now in v0.18 images as outlined in #2999 (comment) / #2999 (comment). |
@BenTheElder I retried it yesterday and it indeed seemed to bring up a cluster without issues, I maybe must have gotten confused while testing different images and kind versions for our setup. I scheduled some pods etc. and didn't get any /dev/ptmx issues, only difference I had was that my ubuntu went from 5.15.0-67-generic to 5.15.0-69-generic, but seems very doubtful that had any impact. So I think I must have been still trying with 0.17.0 or 0.18.0 and a wrong image somehow. |
Thanks! I believe we can close this now as fixed by the runc upgrade in v0.18+ images. sorry this took so long 😅 |
Can confirm that this appears to have resolved my observed issues too. Thanks for the update! |
Since kubernetes-sigs/kind#2999 which blocked us from updating kind SR-IOV provider was fixed, we can bump to v0.18.0 (k8s-1.27.1) now. Signed-off-by: Or Shoval <[email protected]>
Since kubernetes-sigs/kind#2999 which blocked us from updating kind SR-IOV provider was fixed, we can bump to v0.18.0 (k8s-1.27.1) now. Signed-off-by: Or Shoval <[email protected]>
Since kubernetes-sigs/kind#2999 which blocked us from updating kind SR-IOV provider was fixed, we can bump to v0.18.0 (k8s-1.27.1) now. Signed-off-by: Or Shoval <[email protected]>
* kind, SR-IOV: Rename folder to kind-1.27-sriov Preparation for the version bump. Signed-off-by: Or Shoval <[email protected]> * kind, SR-IOV: Bump to v0.18.0, k8s-1.27.1 Since kubernetes-sigs/kind#2999 which blocked us from updating kind SR-IOV provider was fixed, we can bump to v0.18.0 (k8s-1.27.1) now. Signed-off-by: Or Shoval <[email protected]> --------- Signed-off-by: Or Shoval <[email protected]>
What happened:
Failed to exec into a Pod with QOS defined when CPU manager is enabled.
After checking cgroup configuration for the Pod, I see only
c 136:* rwm
is allowed.What you expected to happen:
I expected to be able to exec into the pod and get a shell and have the cgroup configuration set up correctly.
How to reproduce it (as minimally and precisely as possible):
Any attempt to exec into other Pods fail from now on with the same reason.
Anything else we need to know?:
SELinux is disabled.
This seems to be related to the change where kind uses systemd with 1.24/25 to manage cgroups.
This problem was not tested without CPU manager.
Environment:
kind version
): 0.17.0kubectl version
): v1.25.3docker info
): 20.10.21/etc/os-release
): RHEL 8.5The text was updated successfully, but these errors were encountered: