Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Podman: "failed to find cpu cgroup (v2)" #1082

Open
rgilton opened this issue Jun 8, 2022 · 8 comments
Open

[BUG] Podman: "failed to find cpu cgroup (v2)" #1082

rgilton opened this issue Jun 8, 2022 · 8 comments
Labels
bug Something isn't working

Comments

@rgilton
Copy link

rgilton commented Jun 8, 2022

What did you do

I followed the instructions on using rootless podman from the k3d documentation.

  • How was the cluster created?
  • k3d registry create --default-network podman hive-registry
  • k3d cluster create --registry-use hive-registry hive

What did you expect to happen

The cluster to start.

Screenshots or terminal output

[rob@f36vm1 ~]$ k3d cluster create --registry-use hive-registry hive
INFO[0000] Prep: Network                                
INFO[0000] Re-using existing network 'k3d-hive' (9093f5f999b9262e7e7cf068011acb42e4a60ea4dee5d6112c5d223dc2d0eeb8) 
INFO[0000] Created image volume k3d-hive-images         
INFO[0000] Container 'k3d-hive-registry' is already connected to 'k3d-hive' 
INFO[0000] Starting new tools node...                   
INFO[0000] Starting Node 'k3d-hive-tools'               
INFO[0001] Creating node 'k3d-hive-server-0'            
INFO[0001] Creating LoadBalancer 'k3d-hive-serverlb'    
INFO[0001] Using the k3d-tools node to gather environment information 
INFO[0001] HostIP: using network gateway 10.89.0.1 address 
INFO[0001] Starting cluster 'hive'                      
INFO[0001] Starting servers...                          
INFO[0001] Starting Node 'k3d-hive-server-0'            
WARN[0002] warning: encountered fatal log from node k3d-hive-server-0 (retrying 0/10): Mtime="2022-06-08T15:00:39Z" level=fatal msg="failed to find cpu cgroup (v2)" 
WARN[0002] warning: encountered fatal log from node k3d-hive-server-0 (retrying 1/10): Mtime="2022-06-08T15:00:39Z" level=fatal msg="failed to find cpu cgroup (v2)" 
WARN[0003] warning: encountered fatal log from node k3d-hive-server-0 (retrying 2/10): Mtime="2022-06-08T15:00:40Z" level=fatal msg="failed to find cpu cgroup (v2)" 
WARN[0005] warning: encountered fatal log from node k3d-hive-server-0 (retrying 3/10): Mtime="2022-06-08T15:00:42Z" level=fatal msg="failed to find cpu cgroup (v2)" 
WARN[0007] warning: encountered fatal log from node k3d-hive-server-0 (retrying 4/10): Mtime="2022-06-08T15:00:44Z" level=fatal msg="failed to find cpu cgroup (v2)" 
WARN[0007] warning: encountered fatal log from node k3d-hive-server-0 (retrying 5/10): Mtime="2022-06-08T15:00:44Z" level=fatal msg="failed to find cpu cgroup (v2)" 
WARN[0008] warning: encountered fatal log from node k3d-hive-server-0 (retrying 6/10): Mtime="2022-06-08T15:00:45Z" level=fatal msg="failed to find cpu cgroup (v2)" 
WARN[0009] warning: encountered fatal log from node k3d-hive-server-0 (retrying 7/10): Mtime="2022-06-08T15:00:46Z" level=fatal msg="failed to find cpu cgroup (v2)" 
WARN[0010] warning: encountered fatal log from node k3d-hive-server-0 (retrying 8/10): Mtime="2022-06-08T15:00:47Z" level=fatal msg="failed to find cpu cgroup (v2)" 
WARN[0012] warning: encountered fatal log from node k3d-hive-server-0 (retrying 9/10): Mtime="2022-06-08T15:00:49Z" level=fatal msg="failed to find cpu cgroup (v2)" 
ERRO[0013] Failed Cluster Start: Failed to start server k3d-hive-server-0: Node k3d-hive-server-0 failed to get ready: error waiting for log line `k3s is up and running` from node 'k3d-hive-server-0': stopped returning log lines 
ERRO[0013] Failed to create cluster >>> Rolling Back    
INFO[0013] Deleting cluster 'hive'                      
INFO[0013] Deleting 2 attached volumes...               
WARN[0013] Failed to delete volume 'k3d-hive-images' of cluster 'hive': failed to find volume 'k3d-hive-images': Error: No such volume: k3d-hive-images -> Try to delete it manually 
FATA[0013] Cluster creation FAILED, all changes have been rolled back! 
[rob@f36vm1 ~]$ 

Spying on the logs from one of the 'server' containers, the last few lines are:

time="2022-06-08T15:00:44Z" level=info msg="Node token is available at /var/lib/rancher/k3s/server/token"
time="2022-06-08T15:00:44Z" level=info msg="To join node to cluster: k3s agent -s https://10.89.0.14:6443 -t ${NODE_TOKEN}"
time="2022-06-08T15:00:44Z" level=info msg="Wrote kubeconfig /output/kubeconfig.yaml"
time="2022-06-08T15:00:44Z" level=info msg="Run: k3s kubectl"
time="2022-06-08T15:00:44Z" level=fatal msg="failed to find cpu cgroup (v2)"

This machine is using cgroups v2 as far as I can see (it is the Fedora 36 default):

[rob@f36vm1 ~]$ mount | grep cgr
cgroup2 on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime,seclabel,nsdelegate,memory_recursiveprot)

Which OS & Architecture

All in a fresh Fedora 36 VM.

Which version of k3d

k3d version v5.4.3
k3s version v1.23.6-k3s1 (default)

Which version of docker

Using podman here:

Client:       Podman Engine
Version:      4.1.0
API Version:  4.1.0
Go Version:   go1.18
Built:        Fri May  6 12:15:54 2022
OS/Arch:      linux/amd64
@rgilton rgilton added the bug Something isn't working label Jun 8, 2022
@radikaled
Copy link

I also ran into this issue and was able to make some progress.

It looks like on Fedora 36 a non-root user does not have the cpuset delegation by default:

$ cat /sys/fs/cgroup/user.slice/user-$(id -u).slice/user@$(id -u).service/cgroup.controllers
cpu io memory pids

For reference: Enabling CPU, CPUSET, and I/O delegation

Once I enabled the cpuset delegation (as outlined in the above) success!

$ cat /sys/fs/cgroup/user.slice/user-$(id -u).slice/user@$(id -u).service/cgroup.controllers
cpuset cpu io memory pids
$ k3d cluster create
INFO[0000] Prep: Network                                
INFO[0000] Created network 'k3d-k3s-default'            
INFO[0000] Created image volume k3d-k3s-default-images  
INFO[0000] Starting new tools node...                   
INFO[0000] Starting Node 'k3d-k3s-default-tools'        
INFO[0001] Creating node 'k3d-k3s-default-server-0'     
INFO[0001] Creating LoadBalancer 'k3d-k3s-default-serverlb' 
INFO[0001] Using the k3d-tools node to gather environment information 
INFO[0001] HostIP: using network gateway 10.89.0.1 address 
INFO[0001] Starting cluster 'k3s-default'               
INFO[0001] Starting servers...                          
INFO[0001] Starting Node 'k3d-k3s-default-server-0'     
INFO[0005] All agents already running.                  
INFO[0005] Starting helpers...                          
INFO[0005] Starting Node 'k3d-k3s-default-serverlb'     
INFO[0012] Injecting records for hostAliases (incl. host.k3d.internal) and for 2 network members into CoreDNS configmap... 
INFO[0014] Cluster 'k3s-default' created successfully!  
INFO[0014] You can now use it like this:                
kubectl cluster-info

Although the initial cluster creation is successful, I noticed that the k3d-k3s-default-server-0 was actually having issues staying up, unfortunately. There are some hints in the log about what the kubelet is not happy about:

E0615 19:04:02.504643       2 container_manager_linux.go:457] "Updating kernel flag failed (Hint: enable KubeletInUserNamespace feature flag to ignore the error)" err="open /proc/sys/kernel/panic: permission denied" flag="kernel/panic"
E0615 19:04:02.504726       2 container_manager_linux.go:457] "Updating kernel flag failed (Hint: enable KubeletInUserNamespace feature flag to ignore the error)" err="open /proc/sys/kernel/panic_on_oops: permission denied" flag="kernel/panic_on_oops"
E0615 19:04:02.504878       2 container_manager_linux.go:457] "Updating kernel flag failed (Hint: enable KubeletInUserNamespace feature flag to ignore the error)" err="open /proc/sys/vm/overcommit_memory: permission denied" flag="vm/overcommit_memory"
E0615 19:04:02.504972       2 kubelet.go:1431] "Failed to start ContainerManager" err="[open /proc/sys/kernel/panic: permission denied, open /proc/sys/kernel/panic_on_oops: permission denied, open /proc/sys/vm/overcommit_memory: permission denied]"

So without giving it too much thought I recreated the cluster like so:

$ k3d cluster create --k3s-arg '--kubelet-arg=feature-gates=KubeletInUserNamespace=true@server:*'

Seems OK but haven't dug much deeper to verify:

$ kubectl get nodes -o wide
NAME                       STATUS   ROLES                  AGE   VERSION        INTERNAL-IP   EXTERNAL-IP   OS-IMAGE   KERNEL-VERSION            CONTAINER-RUNTIME
k3d-k3s-default-server-0   Ready    control-plane,master   55s   v1.23.6+k3s1   10.89.0.2     <none>        K3s dev    5.17.13-300.fc36.x86_64   containerd://1.5.11-k3s2

$ kubectl get pods -A
NAMESPACE     NAME                                      READY   STATUS      RESTARTS   AGE
kube-system   local-path-provisioner-6c79684f77-wv88d   1/1     Running     0          2m24s
kube-system   coredns-d76bd69b-rbsw2                    1/1     Running     0          2m24s
kube-system   helm-install-traefik-crd-c64r4            0/1     Completed   0          2m24s
kube-system   metrics-server-7cd5fcb6b7-qfclf           1/1     Running     0          2m24s
kube-system   helm-install-traefik-w74nc                0/1     Completed   2          2m24s
kube-system   svclb-traefik-bgcz8                       2/2     Running     0          104s
kube-system   traefik-df4ff85d6-xf2nm                   1/1     Running     0          104s

I hope this helps!

Cheers,

@hadrabap
Copy link

@radikaled Thanks a lot!

I've been facing same issue on Oracle Linux 8 with CGroupsV2 and rootless podman. The following command helped:

k3d cluster create --k3s-arg '--kubelet-arg=feature-gates=KubeletInUserNamespace=true@server:*'

It might be cool k3d manages this automatically or at least prints a hint if rootless environment is detected.

@almereyda
Copy link

How could a rootless environment be detected?

@hadrabap
Copy link

Well, assuming rootless environment is defined as an environment running under non-root user on host level and container's root user is mapped to the host level (non-root) user, the check should be the user Id of the current process (on host level — the k3d itself) is not 0.

This is (modified) example how I detect root mode in my utility (C++):

#include <unistd.h>

………

const auto uid = getuid();
if (uid > 0) {
    // root-less mode
} else {
    // root-full mode
}

Running k3d inside container should be another exercise — I don't know if it is even supported feature.

To detect the CGroupV1 vs CGroupV2 is more tricky. I have two Oracle Linux 8 systems here. Oracle Linux 8 is capable of running in both modes but the CGroup V1 is the default. The easiest way to check in what version the system currently runs is by checking mounted filesystem name:

CGroup V1:

[opc@ipa ~]$ stat -fc %T /sys/fs/cgroup/
tmpfs

CGroup V2:

[opc@sws ~]$ stat -fc %T /sys/fs/cgroup/
cgroup2fs

If the result for the stat command is cgroup2fs then the system runs in CGroup V2 mode. Otherwise CGroup V1.

P.S.: Please, excuse me if I miss some crucial points here. I'm really new to this kind of stuff.

@iosipeld
Copy link

I had the same issue on Debian 11 today on Alibaba Cloud instance.

I added following lines to /etc/default/grub under GRUB_CMDLINE_LINUX variable

 cgroup_memory=1 cgroup_enable=memory

and rebooted the instance from console. Now error gone and systemd service starts correctly.

@Utopiah
Copy link

Utopiah commented Jan 19, 2024

on Fedora 36 a non-root user does not have the cpuset delegation by default

same on bookworm/sid but following https://rootlesscontaine.rs/getting-started/common/cgroup2/#enabling-cpu-cpuset-and-io-delegation indeed fixed it for me too

@omyhub
Copy link

omyhub commented Mar 8, 2024

[root@localhost ~]# cat /etc/systemd/system/[email protected]/delegate.conf
[Service]
Delegate=cpu cpuset io memory pids

[admin@localhost ~]$ cat /sys/fs/cgroup/user.slice/user-$(id -u).slice/user@$(id -u).service/cgroup.controllers
cpuset io memory pids

[admin@localhost ~]$ stat -fc %T /sys/fs/cgroup/
cgroup2fs

[root@localhost ~]# docker logs -f k3d-k3s-default-server-0
......
time="2024-03-08T06:42:47.390547887Z" level=fatal msg="failed to find cpu cgroup (v2)"

help pls!

@omyhub
Copy link

omyhub commented Mar 8, 2024

if the os is redhat os like and u have the same problem , u can visite the link below
https://access.redhat.com/solutions/6582021
https://access.redhat.com/solutions/737243
https://support.hpe.com/hpesc/public/docDisplay?docId=sf000082729en_us&docLocale=en_US&page=index.html
sovle : disable rtkit-daemon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

7 participants