-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
k3s container v1.22 and newer fails on docker-desktop and k3d clusters #4873
Comments
Hmm. Disabling cgroup evac when the agent is disabled should be easy enough to do, but we are considering dropping the hidden --disable-agent flag entirely, since it has long been unsupported and some features (managed etcd for example) will not work properly with the agent disabled. |
@brandond thanks for the answer! Mhh thats sad to hear as this would eliminate k3s as a viable solution for our use case and we really think that k3s is currently a great fit for virtual Kubernetes clusters as it provides a minimal control plane which is exactly what we need and has quite some advantages over a regular k8s deployment. We could switch to another distro such as k0s or vanilla k8s containers, which work fine currently, but we were very happy so far with what k3s provided for us and it worked really well for our users up until this point. I know that the disable flag was kind of a workaround to begin with and we would be fine if certain features would not work with it, but removing it would definitely render k3s not useable for us anymore. So we would be very grateful if you would consider to continue support for disabling the agent, which might be useful for other use cases as well that only need parts of the control plane. |
What's the downside of running the kubelet in your container? Do you just want to avoid seeing a node object in the virtual cluster? |
@brandond vcluster just virtualizes the control plane and schedules actual workloads on the host cluster, which means the virtual cluster just consists of an api server, controller manager, storage backend and hypervisor that translates objects between the virtual control plane and the acutal host cluster, which does not require any additional kubelets. It would be also possible to run an actual kubelet in the container, but this would most certainly require more permissions on the node, while vcluster is mostly targeted at multi tenancy use cases, where for example you only have a access to a single namespace, but need to install a new CRD or webhook etc. in it, which the control plane virtualization allows. |
It's a bit of a hack, but since cgroup evacuation only runs if k3s is pid 1, you could try running /bin/k3s from /bin/sh: apiVersion: v1
kind: Pod
metadata:
name: test
spec:
containers:
- args:
- -c
- /bin/k3s server
--write-kubeconfig=/data/k3s-config/kube-config.yaml
--disable=traefik,servicelb,metrics-server,local-storage,coredns
--disable-network-policy
--disable-agent
--disable-scheduler
--disable-cloud-controller
--flannel-backend=none
--kube-controller-manager-arg=controllers=*,-nodeipam,-nodelifecycle,-persistentvolume-binder,-attachdetach,-persistentvolume-expander,-cloud-node-lifecycle
--service-cidr=10.96.0.0/12
&& true
command:
- /bin/sh
image: rancher/k3s:v1.22.5-k3s1
name: k3s note that |
That said, I'm curious what about your environment makes the cgroups read-only - we regularly run K3s in Docker and Containerd without issue. |
@brandond great, thanks a lot, that certainly helps. I tested this with my Docker Desktop 4.3.2 (72729) Kubernetes cluster v1.22.4 on an Intel macOS Monterey 12.1. We have several test machines and it doesn't work anymore for all of those with the new docker desktop version that uses the new v1.22 Kubernetes cluster. Maybe they changed something there, but we also received reports about this not working anymore in v1.22 kind or k3d on linux machines. |
k3d doesn't run k3s as pid 1 (it uses its own entrypoint script that does the cgroup evacuation, among other things) so it wouldn't be affected. This behavior was added as a workaround for cgroupv2 systems. I don't personally use kind, is it normal for that to set up the containers with read-only cgroups? |
@brandond but we are running k3s within k3d as a container and that container then fails, so I guess the k3s container would run as pid 1 within the k3d cluster correct? I'm not sure why those cgroups are read-only, I don't really have a lot of expertise in that to be honest, but it certainly very weird that this only occurs on some systems |
Can you identify which host operating systems/distros it's read-only on? |
Cc @iwilltry42 |
@brandond this is probably a non exhaustive list, but docker desktop seems to use linuxkit ( But one thing that caught my eye was that it worked with docker desktop v4.2.0, but didn't with v4.3.0 and the following is added in their release notes for v4.3.0:
Especially the last sentence indicates that just running the k3s container in their Kubernetes distribution will not be enough as we would need privileged access there as well as those rw permissions. Unfortunately running k3s as a privileged container wouldn't be an option for us as in multi-tenancy scenarios this is pretty much a no go. Thanks so much for your help! |
Not sure if it helps, but let me just drop some info here:
UPDATE 1: Just tested with Docker for Desktop on Windows 10 without a problem 🤔
|
@iwilltry42 thanks so much for your reply and investigation! Our use case is a little bit different from the default k3d setup and we do not run k3s in docker directly, but rather use an already existing k3d cluster, docker desktop or kind Kubernetes cluster to schedule a new limited k3s pod (basically just the data store, api server and controller manager, while everything else such as scheduler, agent etc. is disabled) in there. The problem then is that this pod fails to start (as k3s is trying to evacuate the cgroups on a read only file system and k3s runs in non privileged mode, which for our use case wouldn't be necessary at all I guess), so its basically Kubernetes within Kubernetes instead of Kubernetes within docker. To reproduce the problem you can setup the k3d like you did and then schedule a pod in there like this, which should fail with the above error message (but mysteriously for some system this works as well as for example GKE or older docker desktop versions, which might not use cgroupsv2):
We then have an additional component that syncs created pods in that minimal control plane to the actual Kubernetes cluster which schedules those on then real nodes, but the created k3s pod itself is actually not able to schedule any pods as there are no real nodes joined. The advantage of this is that you essentially can split up the control plane and allow users access to a fully working Kubernetes cluster with CRDs, Webhooks, ClusterRoles etc., while the actual workloads are synced to the same namespace on the host cluster, which is great for multi-tenancy scenarios, where you would like to give different people limited access to the host Kubernetes cluster. |
Not mucking about with cgroups when not running the kubelet seems reasonable; I'll take a shot at that for the next patch release. |
@brandond thanks so much, sounds great! |
Validated in all of v1.20.15-rc1+k3s1, v1.21.9-rc1+k3s1, v1.22.6-rc1+k3s1, and v1.23.2-rc1+k3s1
|
New version contain a fix for k3s-io/k3s#4873
Environmental Info:
K3s Version:
Node(s) CPU architecture, OS, and Version:
Cluster Configuration:
Describe the bug:
Hello! Thanks a lot for the great project! I'm one of the maintainers of vcluster and we are using k3s as minimal control plane for our virtual cluster implementation. Unfortunately it seems like k3s stopped working for us since version v1.22 (essentially every version released after PR #4086), emitting the following error on docker-desktop, kind and k3s host clusters:
It worked fine with earlier versions and works fine with vanilla k8s or k0s v1.22 containers.
We have a little bit of a special setup where we run k3s without agent and scheduler and I'm not sure what exactly is causing this error as it works on GKE for example, but would it be somehow possible to not run the root cgroup evacuation if agent is not enabled in order to have similar behaviour like in older versions? If not, is it possible to introduce a flag to disable this?
Steps To Reproduce:
This doesn't work with v1.22 and newer, while it works with v1.21 (e.g. image
rancher/k3s:v1.21.2-k3s1
) and lower.Expected behavior:
k3s container should be running without errors
Actual behavior:
k3s container fails with error:
Backporting
The text was updated successfully, but these errors were encountered: