Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pods not getting deleted with error: failed to parse netconf: unexpected end of JSON input #1662

Closed
Aishwarya-Hebbar opened this issue Oct 19, 2022 · 17 comments · Fixed by sapcc/kubernikus#901
Labels

Comments

@Aishwarya-Hebbar
Copy link

Aishwarya-Hebbar commented Oct 19, 2022

Pods not getting deleted with error: "Unknown desc = failed to destroy network for sandbox "ff252b2872f45846e140c96bb096d5a23100dcd13bd25cecbf858655b336093e": plugin type="flannel" failed (delete): failed to parse netconf: unexpected end of JSON input"

Expected Behavior

Expected all pods to get deleted after a reboot of all k8s nodes.

Current Behavior

Pods not getting deleted with error: failed to parse netconf: unexpected end of JSON input

root@k8s-control-983-1665562565:~# kubectl get sc,sts,pvc,pods -owide -n vsan-stretch-4059 
NAME                                           PROVISIONER              RECLAIMPOLICY   VOLUMEBINDINGMODE   ALLOWVOLUMEEXPANSION   AGE
storageclass.storage.k8s.io/nginx-sc-default   csi.vsphere.vmware.com   Delete          Immediate           false                  15m

NAME                              STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS       AGE   VOLUMEMODE
persistentvolumeclaim/pvc-5h64x   Bound    pvc-0676f481-57e4-49d1-98d0-af84efe81117   2Gi        RWO            nginx-sc-default   15m   Filesystem
persistentvolumeclaim/pvc-77ctb   Bound    pvc-d20aeb95-a2af-4a0b-a5ec-7a6858f804f2   2Gi        RWO            nginx-sc-default   15m   Filesystem
persistentvolumeclaim/pvc-7nw8m   Bound    pvc-6a768f2a-97cb-4634-a13f-746905a855da   2Gi        RWO            nginx-sc-default   15m   Filesystem
persistentvolumeclaim/pvc-92w56   Bound    pvc-6b3bb9a4-a544-43f5-ab91-bcdf97da07c9   2Gi        RWO            nginx-sc-default   15m   Filesystem
persistentvolumeclaim/pvc-9fzx5   Bound    pvc-8a6d736c-bece-459a-9a19-875cb147193c   2Gi        RWO            nginx-sc-default   15m   Filesystem
persistentvolumeclaim/pvc-bjmcs   Bound    pvc-84a83e16-9673-406b-8511-5f5a39a49fd3   2Gi        RWO            nginx-sc-default   15m   Filesystem
persistentvolumeclaim/pvc-bnnv8   Bound    pvc-f8bb9b02-209d-4f4d-808a-60d59ac21f6d   2Gi        RWO            nginx-sc-default   15m   Filesystem
persistentvolumeclaim/pvc-dt6n7   Bound    pvc-cdd0b517-8705-4312-9d99-9d72b6b68641   2Gi        RWO            nginx-sc-default   15m   Filesystem
persistentvolumeclaim/pvc-h6rhn   Bound    pvc-68b84fc6-dd2f-4d68-8a4b-64f3fd697dee   2Gi        RWO            nginx-sc-default   15m   Filesystem
persistentvolumeclaim/pvc-kb6zl   Bound    pvc-64087b99-9691-4303-af4f-815f55580abc   2Gi        RWO            nginx-sc-default   15m   Filesystem
persistentvolumeclaim/pvc-kpvjk   Bound    pvc-f1cdd50d-f8f6-4d42-9e50-eaebc5774fc7   2Gi        RWO            nginx-sc-default   15m   Filesystem
persistentvolumeclaim/pvc-mwm5g   Bound    pvc-f1d6092c-2f9d-4dc8-8c8e-06a71a773307   2Gi        RWO            nginx-sc-default   15m   Filesystem
persistentvolumeclaim/pvc-n6k5p   Bound    pvc-5de5f4bf-9f6e-453d-8a3c-d060c35e1813   2Gi        RWO            nginx-sc-default   15m   Filesystem
persistentvolumeclaim/pvc-ndw7r   Bound    pvc-4e95749b-1954-4ee9-abf2-7ac0c62b5bb4   2Gi        RWO            nginx-sc-default   15m   Filesystem
persistentvolumeclaim/pvc-np82z   Bound    pvc-ac4c75be-c76c-4d8c-a9f0-dc80787814d5   2Gi        RWO            nginx-sc-default   15m   Filesystem
persistentvolumeclaim/pvc-ns9vj   Bound    pvc-68b99f24-1730-4cd5-9d02-e655cd6e25a3   2Gi        RWO            nginx-sc-default   15m   Filesystem
persistentvolumeclaim/pvc-pd5t7   Bound    pvc-2d47bb98-588c-4636-9252-f101ae03faa1   2Gi        RWO            nginx-sc-default   15m   Filesystem
persistentvolumeclaim/pvc-pmxxz   Bound    pvc-5b20601f-33ab-4a56-92df-bf721fc0df11   2Gi        RWO            nginx-sc-default   15m   Filesystem
persistentvolumeclaim/pvc-pzc59   Bound    pvc-029c123f-5c67-42f1-8d28-cd2d30207d5a   2Gi        RWO            nginx-sc-default   15m   Filesystem
persistentvolumeclaim/pvc-qz5w2   Bound    pvc-4a4fb493-60e5-4e51-92af-c9468a71687f   2Gi        RWO            nginx-sc-default   15m   Filesystem
persistentvolumeclaim/pvc-rmmpf   Bound    pvc-72374230-56f8-4286-8f3f-f720762f907c   2Gi        RWO            nginx-sc-default   15m   Filesystem
persistentvolumeclaim/pvc-rrqcw   Bound    pvc-186cbe1c-3db1-4285-9b83-c5c1b058e6eb   2Gi        RWO            nginx-sc-default   15m   Filesystem
persistentvolumeclaim/pvc-rt9mk   Bound    pvc-ae297376-3362-41d5-8be6-d1ed33a5caf7   2Gi        RWO            nginx-sc-default   15m   Filesystem
persistentvolumeclaim/pvc-skd2f   Bound    pvc-9335f2a0-f6d0-4092-92e0-824261cf6a9f   2Gi        RWO            nginx-sc-default   15m   Filesystem
persistentvolumeclaim/pvc-slhbz   Bound    pvc-5a005474-1fb0-46dc-b4fe-10e83d61ede8   2Gi        RWO            nginx-sc-default   15m   Filesystem
persistentvolumeclaim/pvc-sn5xt   Bound    pvc-0108a5b6-dcaa-42d5-b788-1681a8d361c9   2Gi        RWO            nginx-sc-default   15m   Filesystem
persistentvolumeclaim/pvc-vlt9z   Bound    pvc-807c0481-a4b7-4764-97e2-82298925f55d   2Gi        RWO            nginx-sc-default   15m   Filesystem
persistentvolumeclaim/pvc-vxkvk   Bound    pvc-ed7ae8d6-44b5-4b08-a1de-612dc8ec7513   2Gi        RWO            nginx-sc-default   15m   Filesystem
persistentvolumeclaim/pvc-xd6pz   Bound    pvc-6493b9ca-d0e9-4bca-b77d-5245c89683a8   2Gi        RWO            nginx-sc-default   15m   Filesystem
persistentvolumeclaim/pvc-xpkbc   Bound    pvc-ac077678-e7c4-40a7-a92d-c1cb5d5580aa   2Gi        RWO            nginx-sc-default   15m   Filesystem

NAME                   READY   STATUS        RESTARTS   AGE   IP       NODE                      NOMINATED NODE   READINESS GATES
pod/pvc-tester-gk554   0/1     Terminating   0          10m   <none>   k8s-node-387-1665562607   <none>           <none>
pod/pvc-tester-mqwqq   0/1     Terminating   0          10m   <none>   k8s-node-286-1665562641   <none>           <none>

root@k8s-control-983-1665562565:~# kubectl describe pod -n vsan-stretch-4059 
Name:                      pvc-tester-gk554
Namespace:                 vsan-stretch-4059
Priority:                  0
Node:                      k8s-node-387-1665562607/10.180.206.154
Start Time:                Wed, 19 Oct 2022 13:04:00 +0000
Labels:                    <none>
Annotations:               <none>
Status:                    Terminating (lasts 6m25s)
Termination Grace Period:  30s
IP:                        
IPs:                       <none>
Containers:
  write-pod:
    Container ID:  containerd://4c81107fb578b79e00869cf962ecb0cca79dae6198774d594ecda9ba04280293
    Image:         harbor-repo.vmware.com/csi_ci/busybox:1.35
    Image ID:      harbor-repo.vmware.com/csi_ci/busybox@sha256:505e5e20edbb5f2ac0abe3622358daf2f4a4c818eea0498445b7248e39db6728
    Port:          <none>
    Host Port:     <none>
    Command:
      /bin/sh
      -c
      /bin/df -T /mnt/volume1 | /bin/awk 'FNR == 2 {print $2}' > /mnt/volume1/fstype && while true ; do sleep 2 ; done
    State:          Terminated
      Reason:       Unknown
      Exit Code:    255
      Started:      Wed, 19 Oct 2022 13:04:08 +0000
      Finished:     Wed, 19 Oct 2022 13:06:30 +0000
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /mnt/volume1 from volume1 (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-gfgg7 (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  volume1:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  pvc-sn5xt
    ReadOnly:   false
  kube-api-access-gfgg7:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason                  Age                  From                     Message
  ----     ------                  ----                 ----                     -------
  Normal   Scheduled               7m16s                default-scheduler        Successfully assigned vsan-stretch-4059/pvc-tester-gk554 to k8s-node-387-1665562607
  Normal   SuccessfulAttachVolume  7m14s                attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-0108a5b6-dcaa-42d5-b788-1681a8d361c9"
  Normal   Pulled                  7m8s                 kubelet                  Container image "harbor-repo.vmware.com/csi_ci/busybox:1.35" already present on machine
  Normal   Created                 7m8s                 kubelet                  Created container write-pod
  Normal   Started                 7m8s                 kubelet                  Started container write-pod
  Normal   Killing                 6m55s                kubelet                  Stopping container write-pod
  Warning  FailedKillPod           2s (x22 over 4m30s)  kubelet                  error killing pod: failed to "KillPodSandbox" for "0ba0e400-a981-4182-9a41-854260bcfa7a" with KillPodSandboxError: "rpc error: code = Unknown desc = failed to destroy network for sandbox \"8c760daa9dcba8c41858b14ba2bf4ae3ea47186dc7671b962465075c5fea92ec\": plugin type=\"flannel\" failed (delete): failed to parse netconf: unexpected end of JSON input"

Steps to Reproduce (for bugs)

  1. Create 30 PVCs and wait for its binding of each pvc with a PV.
  2. Create a pod with each PVC created in step 1.
  3. Delete all pods and reboot all k8s worker nodes simultaneously.
  4. Once the k8s worker nodes are up and Running, all the pods should be deleted.

Context

Disaster recovery scenarios with k8s 1.24 in containerd environment is failing with this issue. This was working fine in k8s 1.23 with dockershim.

##Your Environment
Flannel version: v0.18.1 (https://github.com/flannel-io/flannel/blob/v0.18.1/Documentation/kube-flannel.yml)
Etcd version: etcd:3.5.3-0
Kubernetes version (if used): 1.24
Operating System and version: linux
Link to your project (optional): https://github.com/kubernetes-sigs/vsphere-csi-driver

@rbrtbnfgl
Copy link
Contributor

Could you check the content of /etc/cni/net.d/10-flannel.conflist and the output of kubectl -n kube-system get configmap kube-flannel-cfg -o yaml

@divyenpatel
Copy link

@rbrtbnfgl

# kubectl -n kube-system get configmap kube-flannel-cfg -o yaml
apiVersion: v1
data:
  cni-conf.json: |
    {
      "name": "cbr0",
      "cniVersion": "0.3.1",
      "plugins": [
        {
          "type": "flannel",
          "delegate": {
            "hairpinMode": true,
            "isDefaultGateway": true
          }
        },
        {
          "type": "portmap",
          "capabilities": {
            "portMappings": true
          }
        }
      ]
    }
  net-conf.json: |
    {
      "Network": "10.244.0.0/16",
      "Backend": {
        "Type": "vxlan"
      }
    }
kind: ConfigMap
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"v1","data":{"cni-conf.json":"{\n  \"name\": \"cbr0\",\n  \"cniVersion\": \"0.3.1\",\n  \"plugins\": [\n    {\n      \"type\": \"flannel\",\n      \"delegate\": {\n        \"hairpinMode\": true,\n        \"isDefaultGateway\": true\n      }\n    },\n    {\n      \"type\": \"portmap\",\n      \"capabilities\": {\n        \"portMappings\": true\n      }\n    }\n  ]\n}\n","net-conf.json":"{\n  \"Network\": \"10.244.0.0/16\",\n  \"Backend\": {\n    \"Type\": \"vxlan\"\n  }\n}\n"},"kind":"ConfigMap","metadata":{"annotations":{},"labels":{"app":"flannel","tier":"node"},"name":"kube-flannel-cfg","namespace":"kube-system"}}
  creationTimestamp: "2022-10-12T08:18:57Z"
  labels:
    app: flannel
    tier: node
  name: kube-flannel-cfg
  namespace: kube-system
  resourceVersion: "305"
  uid: 50a8e09e-6d12-4b77-b293-05864b2e3301
# cat /etc/cni/net.d/10-flannel.conflist
{
  "name": "cbr0",
  "cniVersion": "0.3.1",
  "plugins": [
    {
      "type": "flannel",
      "delegate": {
        "hairpinMode": true,
        "isDefaultGateway": true
      }
    },
    {
      "type": "portmap",
      "capabilities": {
        "portMappings": true
      }
    }
  ]
}

@MikeZappa87
Copy link

Is it possible to get the containerd version?

@Aishwarya-Hebbar
Copy link
Author

containerd://1.6.6

@rbrtbnfgl
Copy link
Contributor

The config seems right. Could you check kubelet logs? When the deleting request is done it should log the flannel executed command.

@jbguerraz
Copy link

Exactly the same issue here with:
k8s v1.25.1
containerd 1.6.6

@Aishwarya-Hebbar
Copy link
Author

This is what I see in kubelet logs:

Oct 19 13:07:08 k8s-node-286-1665562641 kubelet[1736]: E1019 13:07:08.225691    1736 pod_workers.go:951] "Error syncing pod, skipping" err="failed to \"KillPodSandbox\" for \"6720d35b-d536-4d06-804d-4071447f499f\" with KillPodSandboxError: \"rpc error: code = Unknown desc = failed to destroy network for sandbox \\\"5f34e42a77cc57b11528445729e2d3bdd004c6c9621cfa48c7947d6027808991\\\": plugin type=\\\"flannel\\\" failed (delete): failed to parse netconf: unexpected end of JSON input\"" pod="vsan-stretch-3300/pvc-tester-knsgd" podUID=6720d35b-d536-4d06-804d-4071447f499f
Oct 19 13:07:09 k8s-node-286-1665562641 kubelet[1736]: E1019 13:07:09.201363    1736 kubelet_volumes.go:245] "There were many similar errors. Turn up verbosity to see them." err="orphaned pod \"04bd08f8-a895-4a4a-939b-e6521d5f17d0\" found, but error occurred when trying to remove the volumes dir: not a directory" numErrs=34
Oct 19 13:07:11 k8s-node-286-1665562641 kubelet[1736]: E1019 13:07:11.209339    1736 kubelet_volumes.go:245] "There were many similar errors. Turn up verbosity to see them." err="orphaned pod \"04bd08f8-a895-4a4a-939b-e6521d5f17d0\" found, but error occurred when trying to remove the volumes dir: not a directory" numErrs=34
Oct 19 13:07:11 k8s-node-286-1665562641 kubelet[1736]: E1019 13:07:11.998035    1736 event.go:267] Server rejected event '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"pvc-tester-knsgd.171f7a4005856a74", GenerateName:"", Namespace:"vsan-stretch-3300", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:<nil>, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ZZZ_DeprecatedClusterName:"", ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Pod", Namespace:"vsan-stretch-3300", Name:"pvc-tester-knsgd", UID:"6720d35b-d536-4d06-804d-4071447f499f", APIVersion:"v1", ResourceVersion:"1785544", FieldPath:""}, Reason:"FailedKillPod", Message:"error killing pod: failed to \"KillPodSandbox\" for \"6720d35b-d536-4d06-804d-4071447f499f\" with KillPodSandboxError: \"rpc error: code = Unknown desc = failed to destroy network for sandbox \\\"5f34e42a77cc57b11528445729e2d3bdd004c6c9621cfa48c7947d6027808991\\\": plugin type=\\\"flannel\\\" failed (delete): failed to parse netconf: unexpected end of JSON input\"", Source:v1.EventSource{Component:"kubelet", Host:"k8s-node-286-1665562641"}, FirstTimestamp:time.Date(2022, time.October, 19, 13, 6, 42, 539498100, time.Local), LastTimestamp:time.Date(2022, time.October, 19, 13, 7, 8, 225645184, time.Local), Count:3, Type:"Warning", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'events "pvc-tester-knsgd.171f7a4005856a74" is forbidden: unable to create new content in namespace vsan-stretch-3300 because it is being terminated' (will not retry!)
Oct 19 13:07:13 k8s-node-286-1665562641 kubelet[1736]: E1019 13:07:13.233607    1736 kubelet_volumes.go:245] "There were many similar errors. Turn up verbosity to see them." err="orphaned pod \"04bd08f8-a895-4a4a-939b-e6521d5f17d0\" found, but error occurred when trying to remove the volumes dir: not a directory" numErrs=34
Oct 19 13:07:15 k8s-node-286-1665562641 kubelet[1736]: E1019 13:07:15.205679    1736 kubelet_volumes.go:245] "There were many similar errors. Turn up verbosity to see them." err="orphaned pod \"04bd08f8-a895-4a4a-939b-e6521d5f17d0\" found, but error occurred when trying to remove the volumes dir: not a directory" numErrs=34
Oct 19 13:07:17 k8s-node-286-1665562641 kubelet[1736]: E1019 13:07:17.201555    1736 kubelet_volumes.go:245] "There were many similar errors. Turn up verbosity to see them." err="orphaned pod \"04bd08f8-a895-4a4a-939b-e6521d5f17d0\" found, but error occurred when trying to remove the volumes dir: not a directory" numErrs=34
Oct 19 13:07:17 k8s-node-286-1665562641 kubelet[1736]: E1019 13:07:17.246208    1736 remote_runtime.go:248] "StopPodSandbox from runtime service failed" err="rpc error: code = Unknown desc = failed to destroy network for sandbox \"ff252b2872f45846e140c96bb096d5a23100dcd13bd25cecbf858655b336093e\": plugin type=\"flannel\" failed (delete): failed to parse netconf: unexpected end of JSON input" podSandboxID="ff252b2872f45846e140c96bb096d5a23100dcd13bd25cecbf858655b336093e"
Oct 19 13:07:17 k8s-node-286-1665562641 kubelet[1736]: E1019 13:07:17.246377    1736 kuberuntime_manager.go:999] "Failed to stop sandbox" podSandboxID={Type:containerd ID:ff252b2872f45846e140c96bb096d5a23100dcd13bd25cecbf858655b336093e}
Oct 19 13:07:17 k8s-node-286-1665562641 kubelet[1736]: E1019 13:07:17.246608    1736 kubelet.go:1784] failed to "KillPodSandbox" for "e1909e28-5731-4111-8031-3c989b5dc0cd" with KillPodSandboxError: "rpc error: code = Unknown desc = failed to destroy network for sandbox \"ff252b2872f45846e140c96bb096d5a23100dcd13bd25cecbf858655b336093e\": plugin type=\"flannel\" failed (delete): failed to parse netconf: unexpected end of JSON input"
Oct 19 13:07:17 k8s-node-286-1665562641 kubelet[1736]: E1019 13:07:17.246702    1736 pod_workers.go:951] "Error syncing pod, skipping" err="failed to \"KillPodSandbox\" for \"e1909e28-5731-4111-8031-3c989b5dc0cd\" with KillPodSandboxError: \"rpc error: code = Unknown desc = failed to destroy network for sandbox \\\"ff252b2872f45846e140c96bb096d5a23100dcd13bd25cecbf858655b336093e\\\": plugin type=\\\"flannel\\\" failed (delete): failed to parse netconf: unexpected end of JSON input\"" pod="vsan-stretch-4059/pvc-tester-mqwqq" podUID=e1909e28-5731-4111-8031-3c989b5dc0cd
Oct 19 13:07:19 k8s-node-286-1665562641 kubelet[1736]: E1019 13:07:19.204879    1736 kubelet_volumes.go:245] "There were many similar errors. Turn up verbosity to see them." err="orphaned pod \"04bd08f8-a895-4a4a-939b-e6521d5f17d0\" found, but error occurred when trying to remove the volumes dir: not a directory" numErrs=34
Oct 19 13:07:21 k8s-node-286-1665562641 kubelet[1736]: E1019 13:07:21.200498    1736 kubelet_volumes.go:245] "There were many similar errors. Turn up verbosity to see them." err="orphaned pod \"04bd08f8-a895-4a4a-939b-e6521d5f17d0\" found, but error occurred when trying to remove the volumes dir: not a directory" numErrs=34
Oct 19 13:07:22 k8s-node-286-1665562641 kubelet[1736]: E1019 13:07:22.224067    1736 remote_runtime.go:248] "StopPodSandbox from runtime service failed" err="rpc error: code = Unknown desc = failed to destroy network for sandbox \"5f34e42a77cc57b11528445729e2d3bdd004c6c9621cfa48c7947d6027808991\": plugin type=\"flannel\" failed (delete): failed to parse netconf: unexpected end of JSON input" podSandboxID="5f34e42a77cc57b11528445729e2d3bdd004c6c9621cfa48c7947d6027808991"
Oct 19 13:07:22 k8s-node-286-1665562641 kubelet[1736]: E1019 13:07:22.224132    1736 kuberuntime_manager.go:999] "Failed to stop sandbox" podSandboxID={Type:containerd ID:5f34e42a77cc57b11528445729e2d3bdd004c6c9621cfa48c7947d6027808991}
Oct 19 13:07:22 k8s-node-286-1665562641 kubelet[1736]: E1019 13:07:22.224181    1736 kubelet.go:1784] failed to "KillPodSandbox" for "6720d35b-d536-4d06-804d-4071447f499f" with KillPodSandboxError: "rpc error: code = Unknown desc = failed to destroy network for sandbox \"5f34e42a77cc57b11528445729e2d3bdd004c6c9621cfa48c7947d6027808991\": plugin type=\"flannel\" failed (delete): failed to parse netconf: unexpected end of JSON input"
Oct 19 13:07:22 k8s-node-286-1665562641 kubelet[1736]: E1019 13:07:22.224231    1736 pod_workers.go:951] "Error syncing pod, skipping" err="failed to \"KillPodSandbox\" for \"6720d35b-d536-4d06-804d-4071447f499f\" with KillPodSandboxError: \"rpc error: code = Unknown desc = failed to destroy network for sandbox \\\"5f34e42a77cc57b11528445729e2d3bdd004c6c9621cfa48c7947d6027808991\\\": plugin type=\\\"flannel\\\" failed (delete): failed to parse netconf: unexpected end of JSON input\"" pod="vsan-stretch-3300/pvc-tester-knsgd" podUID=6720d35b-d536-4d06-804d-4071447f499f

Attached kubelet logs
Logs: https://gist.github.com/Aishwarya-Hebbar/531b4a59763e689c648b70fe69daaecf

@rbrtbnfgl
Copy link
Contributor

Could you check the files included on /var/lib/cni/flannel/. It should be a file for each created pods and each file should contain a JSON. Maybe there is an error on the file system and the files are corrupted.
Maybe it's a bug on the CNI that should delete the pods even if the files have errors.

@jbguerraz
Copy link

Hello @rbrtbnfgl 👋
We've found 0 byte files in /var/lib/cni/flannel/. When we deleted them, things went back ok. Why was there those empty files is still an open question!
Since we deleted those files we didn't yet face it again.

@Aishwarya-Hebbar
Copy link
Author

Aishwarya-Hebbar commented Nov 15, 2022

Yes, we also found 0 bytes files in dir /var/lib/cni/flannel/ in worker nodes but after deleting those files and running the same workflow, we are hitting into the same issue.

root@k8s-node-402-1668379934:/var/lib/cni/flannel# ls -l

total 0
-rw------- 1 root root 0 Nov 15 07:34 6ffa3211851027a4760ce1b0b410623a6970945c9cae2468ad3011298669e42f

@stale
Copy link

stale bot commented May 15, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label May 15, 2023
@Pell17
Copy link

Pell17 commented May 24, 2023

We saw this problem for the first time this week. Using K3s 1.26.3, but the issue doesn't seem version related, considering the comments above. I don't have a definite clue as to the cause, but after some questioning it turns out that this system is special because the operator is frequently turning off power to the system, and we don't have a UPS. It seems plausible that this could be related to the zero-bytes files in /var/lib/cni/flannel. After we deleted these files (there were four of them), the system automatically restarted the pods and went back to normal.
The OP reported that the 0-byte files appeared again after rebooting the node(s) - we haven't seen this (yet), but it's a further indication that it might have something to do with the file system during shutdown.
In any case, I think it would make sense for Flannel to be more robust when handling JSON errors, especially in the case of 0-byte files.

@stale stale bot removed the wontfix label May 24, 2023
@thomasferrandiz
Copy link
Contributor

The issue is actually in the flannel CNI binary and should be fixed in release 1.1.2: https://github.com/flannel-io/cni-plugin/releases/tag/v1.1.2

Could you check which version is used?

@Pell17
Copy link

Pell17 commented May 26, 2023

Could you check which version is used?

Sure, and it seems I was incorrect. I thought they were running a newer version of the system at the site, but it turns out they are still running with K3s 1.23.6, which includes Flannel 0.17.0 and cni-plugin 1.0.1.
Looks like it should be fixed in that case! Unfortunately there are no K3s releases that use 1.1.2 or newer. Only release candidates.

Copy link

stale bot commented Nov 22, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Nov 22, 2023
@stale stale bot closed this as completed Dec 13, 2023
@tuupola
Copy link

tuupola commented Feb 6, 2024

I also see this problem randomly happening with K3S v1.26.7+k3s1. Deleting the zero byte file does not fix the problem.

@manuelbuil
Copy link
Collaborator

I also see this problem randomly happening with K3S v1.26.7+k3s1. Deleting the zero byte file does not fix the problem.

Could you open the issue in K3s please?

Nuckal777 added a commit to sapcc/kubernikus that referenced this issue Apr 10, 2024
Will be rolled out via seed reconciliation.
Fixes flannel-io/flannel#1662
Nuckal777 added a commit to sapcc/kubernikus that referenced this issue Apr 11, 2024
Will be rolled out via seed reconciliation.
Fixes flannel-io/flannel#1662
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants