-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
After node restart linkerd-cni pod hast to be restarted sometimes #12490
Comments
Can you clarify how is the linkerd-cni failing to start? (e.g. logs, events, status) |
The linkerd-cni is not failing to start. I sometimes have to restart the cni pod and everything is ok again. The cni pod is not logging any errors. |
Ok thanks for the clarification. We've released a new version for linkerd-cni that might be able to better catch when the network CNI config changes. Can you give it a try? |
Updating the CNI to this version fixed the issue. Thank you. |
What is the issue?
When restarting a node inside the cluster, where the linkerd-cni is deployed, it will sometimes fail to come up correctly. Therefore the other linkerd pods will fail to come up, because of the failing linkerd-network-validator.
After restarting the cni pod all the pods will come up.
I looked at the config /etc/cni/net.d/ and i can confirm that when the linkerd pods fail to start up the linkerd config is missing. When restarting the cni pod the config is there.
I already found this bug report which stated that this is fixed. (#11699). I added the repair-controller but it did not fix the issue for me. This repair controller only restarts the failing linkerd pods, but it should also restart the linkerd-cni pod.
How can it be reproduced?
Logs, error output, etc
ERROR linkerd_network_validator: Unable to connect to validator. Please ensure iptables rules are rewriting traffic as expected error=Connection refused (os error 111)
output of
linkerd check -o short
inkerd check -o short
linkerd-identity
‼ issuer cert is valid for at least 60 days
issuer certificate will expire on 2024-04-24T18:29:51Z
see https://linkerd.io/2.14/checks/#l5d-identity-issuer-cert-not-expiring-soon for hints
linkerd-control-plane-proxy
| container "linkerd-proxy" in pod "linkerd-identity-7b78b4db96-rdzhv" is not ready
Environment
Kubernetes version: 1.27.11
Cluster environment: Self hosted / Rancher / 3 Nodes
Host OS: Debian 12
Linkerd version: 2.14.10
Possible solution
I think this repair-controller restarts the linkerd pods, but it isnt restarting the linkerd-cni pod. There should be check if the linkerd config is correct in place.
Additional context
No response
Would you like to work on fixing this bug?
None
The text was updated successfully, but these errors were encountered: