Unable to update cni config: no networks found in /etc/cni/net.d/ #10974

elliotdobson · 2021-03-04T03:50:40Z

1. What kops version are you running? The command kops version, will display
this information.

Version 1.19.1 (git-8589b4d157a9cb05c54e320c77b0724c4dd094b2)

2. What Kubernetes version are you running? kubectl version will print the
version if a cluster is running or provide the Kubernetes version specified as
a kops flag.

Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.3", GitCommit:"1e11e4a2108024935ecfcb2912226cedeafd99df", GitTreeState:"clean", BuildDate:"2020-10-15T11:35:50Z", GoVersion:"go1.15.2", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.8", GitCommit:"fd5d41537aee486160ad9b5356a9d82363273721", GitTreeState:"clean", BuildDate:"2021-02-17T12:33:08Z", GoVersion:"go1.15.8", Compiler:"gc", Platform:"linux/amd64"}

3. What cloud provider are you using?
AWS

4. What commands did you run? What is the simplest way to reproduce this issue?
Cluster is currently running Kubernetes v1.18.15 with Kops v1.19.1

Update Kubernetes to v1.19.8
kops rolling-update cluster clustername.tld --instance-group master-a --yes

5. What happened after the commands executed?
The existing master in instance group master-a was terminated and a new one started however it never joined the cluster. Investigating the logs on the instance it appears that the CNI was never started

From /var/log/syslog

Mar  4 03:20:24 ip-172-22-0-6 kubelet[5247]: I0304 03:20:24.605693    5247 csi_plugin.go:994] Failed to contact API server when waiting for CSINode publishing: csinodes.storage.k8s.io
"ip-172-22-0-6.ap-southeast-2.compute.internal" is forbidden: User "system:node:ip-172-22-0-6.ap-southeast-2.compute.internal" cannot get resource "csinodes" in API group "storage.k8s.io" at the cluster scope
Mar  4 03:20:24 ip-172-22-0-6 kubelet[5247]: E0304 03:20:24.755995    5247 kubelet.go:2134] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
Mar  4 03:20:24 ip-172-22-0-6 kubelet[5247]: I0304 03:20:24.776694    5247 kubelet.go:449] kubelet nodes not sync
Mar  4 03:20:24 ip-172-22-0-6 kubelet[5247]: I0304 03:20:24.776734    5247 kubelet.go:449] kubelet nodes not sync
Mar  4 03:20:24 ip-172-22-0-6 kubelet[5247]: I0304 03:20:24.920583    5247 kubelet.go:449] kubelet nodes not sync
Mar  4 03:20:25 ip-172-22-0-6 kubelet[5247]: I0304 03:20:25.614219    5247 csi_plugin.go:994] Failed to contact API server when waiting for CSINode publishing: csinodes.storage.k8s.io
"ip-172-22-0-6.ap-southeast-2.compute.internal" is forbidden: User "system:node:ip-172-22-0-6.ap-southeast-2.compute.internal" cannot get resource "csinodes" in API group "storage.k8s.io" at the cluster scope
Mar  4 03:20:25 ip-172-22-0-6 kubelet[5247]: W0304 03:20:25.733818    5247 cni.go:239] Unable to update cni config: no networks found in /etc/cni/net.d/
Mar  4 03:20:25 ip-172-22-0-6 kubelet[5247]: I0304 03:20:25.776985    5247 kubelet.go:449] kubelet nodes not sync
Mar  4 03:20:25 ip-172-22-0-6 kubelet[5247]: I0304 03:20:25.920364    5247 kubelet.go:449] kubelet nodes not sync

6. What did you expect to happen?
The node to start normally with CNI and join the cluster

7. Please provide your cluster manifest. Execute
kops get --name my.example.com -o yaml to display your cluster manifest.
You may want to remove your cluster name and other sensitive information.

8. Please run the commands with most verbose logging by adding the -v 10 flag.
Paste the logs into this report, or in a gist and provide the gist link here.

9. Anything else do we need to know?

The text was updated successfully, but these errors were encountered:

hakman · 2021-03-04T09:08:57Z

Please take a look at the troubleshooting guide and see if you can find out more:
https://kops.sigs.k8s.io/operations/troubleshoot/#missing-files-in-optbincni

elliotdobson · 2021-03-04T21:49:23Z

Thanks for the tip @hakman.

nodeup was successful on the upgraded node

/opt/cni/bin is populated with a bunch of executables (Note: the documentation says /opt/bin/cni which did not exist on my node, is that a typo?) however it is missing 3 executables when compared to a healthy node running kubernetes v1.18.15

healthy node

root@ip-172-22-1-86:/opt/cni/bin# ls -A1
bandwidth
bridge
calico
calico-ipam
dhcp
firewall
flannel
host-device
host-local
install
ipvlan
loopback
macvlan
portmap
ptp
sbr
static
tuning
vlan

bad node

root@ip-172-22-0-59:/opt/cni/bin# ls -A1
bandwidth
bridge
dhcp
firewall
flannel
host-device
host-local
ipvlan
loopback
macvlan
portmap
ptp
sbr
static
tuning
vlan

We are using calico as our CNI so this makes sense. CNI plugin files are missing in /etc/cni/net.d .

On the bad node I am able to do kubectl commands to the API server such as kubectl get nodes, however the bad node is not listed as it has not yet joined the cluster. kubectl get pods -n kube-system shows runnings pods in the cluster however it does not list any of the pods that are running on the bad node.

healthy node

k8s_aws-node-termination-handler_aws-node-termination-handler-bvm5c_kube-system_dfa16b6e-f342-49f3-8843-f38e50570c98_0
k8s_calico-node_calico-node-wpb9q_kube-system_2a8342b9-c39f-4de1-8e54-2e9e391c28ec_0
k8s_cluster-autoscaler_cluster-autoscaler-765779dc49-6g577_kube-system_14aaf1fd-f251-4ee6-9cb0-48b9a09122b8_0
k8s_dns-controller_dns-controller-cd457856b-vjwsj_kube-system_b2c56f70-f2fa-47f4-9363-07d6404c2c2b_0
k8s_etcd-manager_etcd-manager-events-ip-172-22-1-86.ap-southeast-2.compute.internal_kube-system_e4c41b0b991a6004b2a4bad117f99549_0
k8s_etcd-manager_etcd-manager-main-ip-172-22-1-86.ap-southeast-2.compute.internal_kube-system_e37e313ca039afa40989b0741bb38796_0
k8s_filebeat_filebeat-nbv22_cluster-logging_6fbbb853-44de-489f-abb4-25218ecfd1a2_0
k8s_healthcheck_kube-apiserver-ip-172-22-1-86.ap-southeast-2.compute.internal_kube-system_83c9831ee2fb3b42f218abbc7fdc3c29_0
k8s_kops-controller_kops-controller-v9zk9_kube-system_28eb3797-4e7c-46d3-ad7e-53dfeaa540c4_0
k8s_kube-apiserver_kube-apiserver-ip-172-22-1-86.ap-southeast-2.compute.internal_kube-system_83c9831ee2fb3b42f218abbc7fdc3c29_1
k8s_kube-controller-manager_kube-controller-manager-ip-172-22-1-86.ap-southeast-2.compute.internal_kube-system_a9fd34c993bbdbeef4efe71e82df9acc_0
k8s_kube-proxy_kube-proxy-ip-172-22-1-86.ap-southeast-2.compute.internal_kube-system_4097a431eb44cd9b7dc27a133b772ae8_0
k8s_kube-scheduler_kube-scheduler-ip-172-22-1-86.ap-southeast-2.compute.internal_kube-system_974b36335a35c004a1bdfff8737cd022_0
k8s_node-cache_node-local-dns-khgqk_kube-system_d1e1b958-1996-4d07-bbde-29ef26ff13e1_0
k8s_node-exporter_node-exporter-8cgp9_cluster-inframetrics_35059d7b-a397-4754-ad79-ee15dbc94bfd_0
protokube

bad node

k8s_etcd-manager_etcd-manager-events-ip-172-22-0-59.ap-southeast-2.compute.internal_kube-system_d445e696948a8483c5ec9544a3b54790_0
k8s_etcd-manager_etcd-manager-main-ip-172-22-0-59.ap-southeast-2.compute.internal_kube-system_133ce6a5daede34b99d7b662ec97997d_0
k8s_healthcheck_kube-apiserver-ip-172-22-0-59.ap-southeast-2.compute.internal_kube-system_fa764b257354d04666e50fd5704c0389_0
k8s_kube-apiserver_kube-apiserver-ip-172-22-0-59.ap-southeast-2.compute.internal_kube-system_fa764b257354d04666e50fd5704c0389_0
k8s_kube-controller-manager_kube-controller-manager-ip-172-22-0-59.ap-southeast-2.compute.internal_kube-system_38fc258c5dfc41c8869fb79336bc0b94_0
k8s_kube-proxy_kube-proxy-ip-172-22-0-59.ap-southeast-2.compute.internal_kube-system_5794afd3ee4f299e58fc24fc1615bb90_0
k8s_kube-scheduler_kube-scheduler-ip-172-22-0-59.ap-southeast-2.compute.internal_kube-system_ba834ba417d30622312c9e44ef38a6fa_0
protokube

Only system pods have come up on the bad node, no daemonsets which includes calico, kops-controller, node-local-dns, etc.

Running journalctl -f on the bad node shows

Mar 04 21:30:47 ip-172-22-0-59 docker[7069]: NAME                                CURRENT                UPDATE
Mar 04 21:30:47 ip-172-22-0-59 docker[7069]: networking.projectcalico.org        3.16.3-kops.2        3.17.2-kops.2
Mar 04 21:30:47 ip-172-22-0-59 docker[7069]: I0304 21:30:41.196561   31200 channel_version.go:106] Checking existing channel: Version=3.16.3-kops.2 Channel=s3://company-k8s-kops/cluster.tld/addons/bootstrap-channel.yaml Id=k8s-1.16 ManifestHash=51b58d5e1f3bbe3efeb4cb650b59b207c5153fff compared to new channel: Version=3.17.2-kops.2 Channel=s3://company-k8s-kops/cluster.tld/addons/bootstrap-channel.yaml Id=k8s-1.16 ManifestHash=481359cbde9ab10c8b052873284c7b2769e9bcde
Mar 04 21:30:47 ip-172-22-0-59 docker[7069]: I0304 21:30:41.196611   31200 channel_version.go:126] New Version is greater then old
Mar 04 21:30:47 ip-172-22-0-59 docker[7069]: I0304 21:30:41.196660   31200 addon.go:140] Applying update from "s3://company-k8s-kops/cluster.tld/addons/networking.projectcalico.org/k8s-1.16.yaml"
Mar 04 21:30:47 ip-172-22-0-59 docker[7069]: I0304 21:30:41.196730   31200 s3fs.go:290] Reading file "s3://company-k8s-kops/cluster.tld/addons/networking.projectcalico.org/k8s-1.16.yaml"
Mar 04 21:30:47 ip-172-22-0-59 docker[7069]: I0304 21:30:41.211742   31200 apply.go:67] Running command: kubectl apply -f /tmp/channel156421442/manifest.yaml
Mar 04 21:30:47 ip-172-22-0-59 docker[7069]: I0304 21:30:47.040141   31200 apply.go:70] error running kubectl apply -f /tmp/channel156421442/manifest.yaml
Mar 04 21:30:47 ip-172-22-0-59 docker[7069]: I0304 21:30:47.040178   31200 apply.go:71] configmap/calico-config unchanged
Mar 04 21:30:47 ip-172-22-0-59 docker[7069]: customresourcedefinition.apiextensions.k8s.io/bgpconfigurations.crd.projectcalico.org configured
Mar 04 21:30:47 ip-172-22-0-59 docker[7069]: customresourcedefinition.apiextensions.k8s.io/blockaffinities.crd.projectcalico.org configured
Mar 04 21:30:47 ip-172-22-0-59 docker[7069]: customresourcedefinition.apiextensions.k8s.io/clusterinformations.crd.projectcalico.org configured
Mar 04 21:30:47 ip-172-22-0-59 docker[7069]: customresourcedefinition.apiextensions.k8s.io/felixconfigurations.crd.projectcalico.org configured
Mar 04 21:30:47 ip-172-22-0-59 docker[7069]: customresourcedefinition.apiextensions.k8s.io/globalnetworkpolicies.crd.projectcalico.org configured
Mar 04 21:30:47 ip-172-22-0-59 docker[7069]: customresourcedefinition.apiextensions.k8s.io/globalnetworksets.crd.projectcalico.org configured
Mar 04 21:30:47 ip-172-22-0-59 docker[7069]: customresourcedefinition.apiextensions.k8s.io/hostendpoints.crd.projectcalico.org configured
Mar 04 21:30:47 ip-172-22-0-59 docker[7069]: customresourcedefinition.apiextensions.k8s.io/ipamblocks.crd.projectcalico.org configured
Mar 04 21:30:47 ip-172-22-0-59 docker[7069]: customresourcedefinition.apiextensions.k8s.io/ipamconfigs.crd.projectcalico.org configured
Mar 04 21:30:47 ip-172-22-0-59 docker[7069]: customresourcedefinition.apiextensions.k8s.io/ipamhandles.crd.projectcalico.org configured
Mar 04 21:30:47 ip-172-22-0-59 docker[7069]: customresourcedefinition.apiextensions.k8s.io/ippools.crd.projectcalico.org configured
Mar 04 21:30:47 ip-172-22-0-59 docker[7069]: customresourcedefinition.apiextensions.k8s.io/kubecontrollersconfigurations.crd.projectcalico.org configured
Mar 04 21:30:47 ip-172-22-0-59 docker[7069]: customresourcedefinition.apiextensions.k8s.io/networkpolicies.crd.projectcalico.org configured
Mar 04 21:30:47 ip-172-22-0-59 docker[7069]: customresourcedefinition.apiextensions.k8s.io/networksets.crd.projectcalico.org configured
Mar 04 21:30:47 ip-172-22-0-59 docker[7069]: clusterrole.rbac.authorization.k8s.io/calico-kube-controllers unchanged
Mar 04 21:30:47 ip-172-22-0-59 docker[7069]: clusterrolebinding.rbac.authorization.k8s.io/calico-kube-controllers unchanged
Mar 04 21:30:47 ip-172-22-0-59 docker[7069]: clusterrole.rbac.authorization.k8s.io/calico-node unchanged
Mar 04 21:30:47 ip-172-22-0-59 docker[7069]: clusterrolebinding.rbac.authorization.k8s.io/calico-node unchanged
Mar 04 21:30:47 ip-172-22-0-59 docker[7069]: daemonset.apps/calico-node configured
Mar 04 21:30:47 ip-172-22-0-59 docker[7069]: serviceaccount/calico-node unchanged
Mar 04 21:30:47 ip-172-22-0-59 docker[7069]: deployment.apps/calico-kube-controllers unchanged
Mar 04 21:30:47 ip-172-22-0-59 docker[7069]: serviceaccount/calico-kube-controllers unchanged
Mar 04 21:30:47 ip-172-22-0-59 docker[7069]: clusterrole.rbac.authorization.k8s.io/k8s-ec2-srcdst unchanged
Mar 04 21:30:47 ip-172-22-0-59 docker[7069]: serviceaccount/k8s-ec2-srcdst unchanged
Mar 04 21:30:47 ip-172-22-0-59 docker[7069]: clusterrolebinding.rbac.authorization.k8s.io/k8s-ec2-srcdst unchanged
Mar 04 21:30:47 ip-172-22-0-59 docker[7069]: deployment.apps/k8s-ec2-srcdst unchanged
Mar 04 21:30:47 ip-172-22-0-59 docker[7069]: The CustomResourceDefinition "bgppeers.crd.projectcalico.org" is invalid: spec.preserveUnknownFields: Invalid value: true: must be false in order to use defaults in the schema
Mar 04 21:30:47 ip-172-22-0-59 docker[7069]: error updating "networking.projectcalico.org": error applying update from "s3://company-k8s-kops/cluster.tld/addons/networking.projectcalico.org/k8s-1.16.yaml": error running kubectl

Do you have any suggestions from the logs above?

vkryzh · 2021-03-16T06:16:00Z

this may be relevant: #11014

alex88 · 2021-04-11T02:51:14Z

I have the same issue upgrading a cluster to 1.19, I did found this instead projectcalico/calico#4237 however I'm not sure how I can apply that

Update: I've added preserveUnknownFields: false here and it worked

hakman · 2021-04-11T03:39:00Z

This should be fixed in kOps 1.20 that was just released and will be also fixed in next kOps 1.19.x in the near future.
Thanks for pointing me in the right direction @alex88.

alex88 · 2021-04-11T03:48:17Z

Oh my bad it seems brew still has 1.19 and I didn't check if there was an updated one, I'll redo an update with the updated one to not leave any custom files behind.
Thanks @hakman that was quick!

leonsp-ai · 2021-04-23T17:16:17Z

@alex88 How did you apply preserveUnknownFields: false on older kops? I'm looking to get it to work on kops 1.18.3. Is it enough to kubectl edit crd bgppeers.crd.projectcalico.org, or does kops overwrite that when deploying?

alex88 · 2021-04-23T17:44:46Z

@leonsp-ai I've edited the calico S3 file since I saw that the master node was reading the config from there and then I've restarted the master node. I think kops replaces that when you do kops cluster update but I'm not 100%. However I think that's just a temporary way to fix it but in my case it was better than downtime

hakman mentioned this issue Apr 11, 2021

Calico fix preserve unknown fields #11202

Closed

hakman mentioned this issue Apr 11, 2021

Remove Calico bgppeer KeepOriginalNextHop default #11203

Merged

hakman closed this as completed Apr 11, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to update cni config: no networks found in /etc/cni/net.d/ #10974

Unable to update cni config: no networks found in /etc/cni/net.d/ #10974

elliotdobson commented Mar 4, 2021

hakman commented Mar 4, 2021

elliotdobson commented Mar 4, 2021

vkryzh commented Mar 16, 2021 •

edited

Loading

alex88 commented Apr 11, 2021 •

edited

Loading

hakman commented Apr 11, 2021

alex88 commented Apr 11, 2021

leonsp-ai commented Apr 23, 2021

alex88 commented Apr 23, 2021

Unable to update cni config: no networks found in /etc/cni/net.d/ #10974

Unable to update cni config: no networks found in /etc/cni/net.d/ #10974

Comments

elliotdobson commented Mar 4, 2021

hakman commented Mar 4, 2021

elliotdobson commented Mar 4, 2021

vkryzh commented Mar 16, 2021 • edited Loading

alex88 commented Apr 11, 2021 • edited Loading

hakman commented Apr 11, 2021

alex88 commented Apr 11, 2021

leonsp-ai commented Apr 23, 2021

alex88 commented Apr 23, 2021

vkryzh commented Mar 16, 2021 •

edited

Loading

alex88 commented Apr 11, 2021 •

edited

Loading