Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

After node restart linkerd-cni pod hast to be restarted sometimes #12490

Closed
msr-financial-com opened this issue Apr 23, 2024 · 4 comments
Closed
Labels

Comments

@msr-financial-com
Copy link

What is the issue?

When restarting a node inside the cluster, where the linkerd-cni is deployed, it will sometimes fail to come up correctly. Therefore the other linkerd pods will fail to come up, because of the failing linkerd-network-validator.
After restarting the cni pod all the pods will come up.

I looked at the config /etc/cni/net.d/ and i can confirm that when the linkerd pods fail to start up the linkerd config is missing. When restarting the cni pod the config is there.

I already found this bug report which stated that this is fixed. (#11699). I added the repair-controller but it did not fix the issue for me. This repair controller only restarts the failing linkerd pods, but it should also restart the linkerd-cni pod.

How can it be reproduced?

  1. Restart a Node
  2. Sometimes the linkerd pods will fail to start

Logs, error output, etc

ERROR linkerd_network_validator: Unable to connect to validator. Please ensure iptables rules are rewriting traffic as expected error=Connection refused (os error 111)

output of linkerd check -o short

inkerd check -o short
linkerd-identity

‼ issuer cert is valid for at least 60 days
issuer certificate will expire on 2024-04-24T18:29:51Z
see https://linkerd.io/2.14/checks/#l5d-identity-issuer-cert-not-expiring-soon for hints

linkerd-control-plane-proxy

| container "linkerd-proxy" in pod "linkerd-identity-7b78b4db96-rdzhv" is not ready

Environment

Kubernetes version: 1.27.11
Cluster environment: Self hosted / Rancher / 3 Nodes
Host OS: Debian 12
Linkerd version: 2.14.10

Possible solution

I think this repair-controller restarts the linkerd pods, but it isnt restarting the linkerd-cni pod. There should be check if the linkerd config is correct in place.

Additional context

No response

Would you like to work on fixing this bug?

None

@alpeb
Copy link
Member

alpeb commented Apr 25, 2024

Can you clarify how is the linkerd-cni failing to start? (e.g. logs, events, status)

@msr-financial-com
Copy link
Author

msr-financial-com commented Apr 30, 2024

The linkerd-cni is not failing to start.
When you restart the node and the pods come back up, the cni pod starts fine and will state everything is fine. But when you have a look inside /etc/cni/net.d/10-canal.conflist you will see that the linkerd config is missing sometimes. If you restart the cni pod the config will be there. Pods that are using linkerd will not be capable to start and will bring up this error:
ERROR linkerd_network_validator: Unable to connect to validator. Please ensure iptables rules are rewriting traffic as expected error=Connection refused (os error 111)

I sometimes have to restart the cni pod and everything is ok again. The cni pod is not logging any errors.

@alpeb
Copy link
Member

alpeb commented May 2, 2024

Ok thanks for the clarification. We've released a new version for linkerd-cni that might be able to better catch when the network CNI config changes. Can you give it a try?
https://github.com/linkerd/linkerd2-proxy-init/releases/tag/cni-plugin%2Fv1.5.0

@msr-financial-com
Copy link
Author

Updating the CNI to this version fixed the issue. Thank you.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jul 5, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants