3.27.2 mount-bpffs init container fails to load libpcap on Rockylinux9.3/Arm64 #8542

RyrieNorth · 2024-02-21T18:08:54Z

When I install the calico network plugin after initializing the kubernetes cluster the following occurs:

[root@k8s-master docker.io]# kubectl create -f calico.yaml
......

[root@k8s-master docker.io]# kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system calico-kube-controllers-68cdf756d9-r75jj 0/1 Pending 0 3s

kube-system calico-node-25ctj 0/1 Init:Error 0 3s

kube-system calico-node-7t8qv 0/1 Init:2/3 0 3s

kube-system calico-node-qrnsr 0/1 Init:1/3 0 3s

kube-system coredns-857d9ff4c9-j6jmj 0/1 Pending 0 105s

kube-system coredns-857d9ff4c9-nh8tf 0/1 Pending 0 98s
......

You can see that the state quickly switches to Init:Error in a very short time.

By describing the analysis, I found the keyword:

[root@k8s-master docker.io]# kubectl describe -n kube-system pods calico-node-25ctj
Events:
Type Reason Age From Message

Warning BackOff 4m59s (x24 over 9m58s) kubelet Back-off restarting failed container mount-bpffs in pod calico-node-25ctj_kube-system(ec997881-48b9-4bc0-9203-d25ef3171052)
......

When I looked at the logs I found one error that appeared more frequently:

[root@k8s-master docker.io]# cat /var/log/messages | grep "qrnsr"
......
Feb 22 01:48:00 localhost kubelet[15596]: E0222 01:48:00.536276 15596 pod_workers.go:1298] "Error syncing pod, skipping" err="failed to "StartContainer" for "mount-bpffs" with CrashLoopBackOff: "back-off 5m0s restarting failed container=mount-bpffs pod=calico-node-qrnsr_kube-system(5b0c855d-482c-4cc0-98f7-da9ae03070c1)"" pod="kube-system/calico-node-qrnsr" podUID="5b0c855d-482c-4cc0-98f7-da9ae03070c1"
......

I've used the mount -l command to check that my system has the bpffs device.
[root@k8s-master docker.io]# mount -l | grep bpf
bpf on /sys/fs/bpf type bpf (rw,nosuid,nodev,noexec,relatime,mode=700)
......

I've tried many things, but nothing works.
Then I tried switching the calico version to v3.27.0 and it worked!
[root@k8s-master docker.io]# kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system calico-kube-controllers-5fc7d6cf67-lcczd 0/1 Pending 0 53s

kube-system calico-node-ptnz8 1/1 Running 0 53s

kube-system calico-node-sthqf 1/1 Running 0 53s

kube-system calico-node-vhwxn 1/1 Running 0 53s

[root@k8s-master docker.io]# cat calico.yaml | grep image
image: docker.io/calico/cni:v3.27.0
image: docker.io/calico/node:v3.27.0
image: docker.io/calico/kube-controllers:v3.27.0
......
I'm puzzled by this, is it the operating system problem? Or is it the kernel version? I hope the officials can answer my question.

Possible Solution

Rolling back calico to v3.27.0

My Environment

Calico version: v3.27.2
Kubernetes: v1.29.2
Operating System and version: Rocky Linux release 9.3 (Blue Onyx), kernel 5.14.0-362.8.1.el9_3.aarch64，arm64
Link to your project (optional): none

tomastigera · 2024-02-21T18:51:29Z

When you try 3.27.0 does the ebpf dataplane come up correctly or the init containers only do not fail? There was definitely a regression in 3.27.0 not building ebpf for arm correctly. That got fixed #8470 but it may not completely bring it back. This said, 3.27.0 may be just a false positive.

Could you share calico-node logs from 3.27.0 just for verification? Would you be able to provide more logs from the failed 3.27.2 init container?

RyrieNorth · 2024-02-22T07:02:04Z

Okay, I've collected some of the logs, so hopefully that will be helpful

calico-v3.27.0.zip
calico-v3.27.2.zip

RyrieNorth · 2024-02-22T08:35:57Z

I tried v3.27.1 later and got the same results as v3.27.2

tomastigera · 2024-02-22T16:54:29Z

First of all, you did not enable BPF dataplane, right? The logs show that BFPEnabled is false. But it seems like 3.27.0 is not quite healthy either:

Readiness probe failed: calico/node is not ready: BIRD is not ready: Error querying BIRD: unable to connect to BIRDv4 socket: dial unix /var/run/calico/bird.ctl: connect: connection refused

Is /var/run/calico/ writable?

Let me check why is 3.27.2 trying to run mount-bpffs when bpf is disabled 🤔

RyrieNorth · 2024-02-22T17:22:30Z

Yes, I didn't enable BPF dataplane because I didn't find the relevant configuration item in my previous deployment method, but the cluster's network is able to forward data traffic normally.
......

Pod status:
[root@k8s-master ~]# kubectl get pods
NAME READY STATUS RESTARTS AGE
nginx 1/1 Running 0 4m40s

[root@k8s-master ~]# kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.96.0.1 443/TCP 8h
nginx NodePort 10.96.130.124 80:32737/TCP 4m25s
......

Traffic test:
[root@k8s-master ~]# curl http://$(kubectl get svc nginx -o jsonpath={.spec.clusterIP})
nginx test page.

Eq:
[root@k8s-master ~]# curl http://10.96.130.124
......

[root@k8s-master ~]# curl http://$(hostname -i):$(kubectl get svc nginx -o jsonpath={.spec.ports..nodePort})
nginx test page.

Eq:
[root@k8s-master ~]# curl http://192.168.1.11:32737
......

But I'm sure the /var/run/calico directory is readable and writable.

This is the result of viewing it with the command:
[root@k8s-master ~]# ls /var/run/calico/ -ld
drwxr-xr-x 3 root root 100 Feb 23 00:59 /var/run/calico/
......

[root@k8s-master ~]# ls /var/run/calico/
bird6.ctl bird.ctl cgroup
......

[root@k8s-master ~]# ls /var/run/calico/ -l
total 0
srw-rw---- 1 root root 0 Feb 23 00:59 bird6.ctl
srw-rw---- 1 root root 0 Feb 23 00:59 bird.ctl
dr-xr-xr-x 12 root root 0 Feb 23 00:58 cgroup
......

And the SELINUX is disable:
[root@k8s-master ~]# getenforce
Disabled

tomastigera · 2024-02-22T18:09:02Z

Could you provide us with logs for the failing mount-bpffs container, not the default calico-node one?

RyrieNorth · 2024-02-22T18:24:31Z

Sorry, I found it.

[root@k8s-master docker.io]# crictl logs 5a1
calico-node: error while loading shared libraries: libpcap.so.0.8: cannot open shared object file: No such file or directory

hjiawei · 2024-02-22T18:44:25Z

libpcap issue is related to #8541.

RyrieNorth · 2024-02-22T18:49:57Z

libpcap issue is related to #8541.

Okay, it looks like it's a problem when building the image, and this is the library used by RockyLinux 9.3
搞半天镜像问题可还行

RyrieNorth · 2024-02-22T18:58:09Z

[root@k8s-slave1 lib64]# ls | grep libpca
libpcap.so.1
libpcap.so.1.10.0

RyrieNorth · 2024-02-22T20:11:55Z

Guys, I found a temporary workaround, you can install an old version of the libpcap package via yum and then modify calico.yaml to get him running:
......

Note that this step replaces the libpcap.so.1.9.1 library on the system:

yum install -y https://dl.rockylinux.org/pub/rocky/8/Devel/aarch64/os/Packages/l/libpcap-1.9.1-5.el8.aarch64.rpm
......

Modify calico.yaml

    - name: "mount-bpffs"
      image: docker.io/calico/node:v3.27.2
      imagePullPolicy: IfNotPresent
      command: ["calico-node", "-init", "-best-effort"]
      volumeMounts:
        - mountPath: /nodeproc
          name: nodeproc
          readOnly: true
        - mountPath: /usr/lib64/libpcap.so.0.8 // Add it here
          name: libpcap-mount
    - name: "calico-node"
      image: docker.io/calico/node:v3.27.2
      imagePullPolicy: IfNotPresent
      volumeMounts:
        - mountPath: /nodeproc
          name: nodeproc
          readOnly: true
        - mountPath: /usr/lib64/libpcap.so.0.8
          name: libpcap-mount  // Add it here

  volumes:
      hostPath:
        type: DirectoryOrCreate
        path: /var/run/nodeagent
    - name: libpcap-mount
      hostPath:
        path: /usr/lib64/libpcap.so.1.9.1  // Add it here

Photos

Finally, it works

RyrieNorth · 2024-02-23T07:33:08Z

Maybe it's the best solution.
......

  containers:
    - name: "mount-bpffs"
      image: docker.io/calico/node:v3.27.2
      imagePullPolicy: IfNotPresent
      #command: ["calico-node", "-init", "-best-effort"]
      command: ["/bin/sh", "-c", "ln -s /usr/lib64/libpcap.so.1.9.1 /usr/lib64/libpcap.so.0.8 && calico-node -init -best-effort"]
  containers:
    - name: calico-node
      image: docker.io/calico/node:v3.27.2
      imagePullPolicy: IfNotPresent
      command: [ "/bin/sh", "-c", "ln -s /usr/lib64/libpcap.so.1.9.1 /usr/lib64/libpcap.so.0.8 && start_runit"]

......

......

It also run it on v3.27.1

tomastigera · 2024-02-23T17:20:00Z

@NorthSkybk thank you for sharing your workaround and reporting that issue. We will try to come up with a proper fix for that.

tomastigera added the area/bpf eBPF Dataplane issues label Feb 21, 2024

tomastigera changed the title ~~calico-nodes init fali when install on kubernetes v1.29.2 Rockylinux9.3/Arm64~~ calico ebpf init fails when installing on kubernetes v1.29.2 Rockylinux9.3/Arm64 Feb 21, 2024

tomastigera added the area/arm64 relates to arm64 label Feb 22, 2024

tomastigera removed the area/bpf eBPF Dataplane issues label Feb 22, 2024

tomastigera changed the title ~~calico ebpf init fails when installing on kubernetes v1.29.2 Rockylinux9.3/Arm64~~ mount-bpffs init container fails when installing on kubernetes v1.29.2 Rockylinux9.3/Arm64 in iptables mode Feb 22, 2024

tomastigera changed the title ~~mount-bpffs init container fails when installing on kubernetes v1.29.2 Rockylinux9.3/Arm64 in iptables mode~~ 3.27.2 mount-bpffs init container fails to load libpcap on Rockylinux9.3/Arm64 Feb 22, 2024

tomastigera assigned hjiawei Feb 23, 2024

RyrieNorth closed this as completed Feb 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

3.27.2 mount-bpffs init container fails to load libpcap on Rockylinux9.3/Arm64 #8542

3.27.2 mount-bpffs init container fails to load libpcap on Rockylinux9.3/Arm64 #8542

RyrieNorth commented Feb 21, 2024

tomastigera commented Feb 21, 2024

RyrieNorth commented Feb 22, 2024

RyrieNorth commented Feb 22, 2024

tomastigera commented Feb 22, 2024

RyrieNorth commented Feb 22, 2024

tomastigera commented Feb 22, 2024

RyrieNorth commented Feb 22, 2024

hjiawei commented Feb 22, 2024

RyrieNorth commented Feb 22, 2024

RyrieNorth commented Feb 22, 2024 •

edited

Loading

RyrieNorth commented Feb 22, 2024

RyrieNorth commented Feb 23, 2024

tomastigera commented Feb 23, 2024

3.27.2 mount-bpffs init container fails to load libpcap on Rockylinux9.3/Arm64 #8542

3.27.2 mount-bpffs init container fails to load libpcap on Rockylinux9.3/Arm64 #8542

Comments

RyrieNorth commented Feb 21, 2024

Possible Solution

My Environment

tomastigera commented Feb 21, 2024

RyrieNorth commented Feb 22, 2024

RyrieNorth commented Feb 22, 2024

tomastigera commented Feb 22, 2024

RyrieNorth commented Feb 22, 2024

tomastigera commented Feb 22, 2024

RyrieNorth commented Feb 22, 2024

hjiawei commented Feb 22, 2024

RyrieNorth commented Feb 22, 2024

RyrieNorth commented Feb 22, 2024 • edited Loading

RyrieNorth commented Feb 22, 2024

Note that this step replaces the libpcap.so.1.9.1 library on the system:

Modify calico.yaml

Photos

Finally, it works

RyrieNorth commented Feb 23, 2024

tomastigera commented Feb 23, 2024

RyrieNorth commented Feb 22, 2024 •

edited

Loading