-
Notifications
You must be signed in to change notification settings - Fork 750
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add better error message indicate why cni ipamD is not starting #122
Comments
I think we are also running into this issue and maybe this one #104. But I don't believe it's because of security groups. I suspect it has something to do with service accounts or RBAC but right now it's extremely difficult to tell what the actual failure is since there is nothing in All I'm seeing is the crash loop output you mentioned above and this single line in the container log:
which is probably expected since you are invoking |
@kuroneko25 , is this a EKS cluster? |
Yes. |
on your worker node, are you able to get 443 port of master. What's the output of on your worker node
|
Should I try with the real master IP or in cluster VIP? |
cluster VIP. the output for
|
Running this on my worker node (on the host VM not from any containers):
Seems to just hang. I believe security groups are configured correctly. |
what's the output of your
In default EKS cluster, the kubernet VIP is 10.100.0.1 |
|
i think you are running security groups issue. In another word, your worker node can NOT reach API server. 172.20.0.1 on port 443. |
@kuroneko25 the specific security group used for creating eks cluster, which is also returned from:
It needs to have port 443 open on the inbound rule |
I think we figured out the issue in our SG configuration. Thanks for the help! |
@kuroneko25 Do you mind sharing the solution? I am seeing a similar problem. @liwenwu-amazon How is this supposed to work exactly if my nodes/pods are on 10.99.x.x and there seems to be no routing to speak of to reach 172.20.x.x?
FWIW:
|
@liwenwu-amazon One of my nodes can connect to the API server fine, and the other can't. How can I debug this?
Both nodes come from the same kops instancegroup, so they both have the same security group. Thanks for the help. |
@cjbottaro if this is same kubernetes cluster they should use same kubernete service IP. Can you check which one is the kubernet service IP, 10.3.0.1 or 10.0.0.1 |
Gah, I'm sorry... typo. They can both reach API. |
@liwenwu-amazon But one of my pods can't reach the API server...
That pod is running on this node which can connect fine:
|
@cjbottaro are u using any HTTP_PROXY? check their setting |
It was all related to this for some reason: kubernetes/kops#2189 (comment) Once I switched to the suggested image, all networking problems went away. Thanks! |
Today, in CNI daemonSet(aws-node) whenever ipamD restart, it query kubernetes API server about Pods already running on the node. If it can not reach kubernetes API server, ipamD will exit and you will see following logs in the /var/log/aws-routed-cni/ipamd.log.xxx
ipamD needs print out explicit error that it failed due to it can NOT communicate with API server.
To verify security groups are configured correctly between worker node and kubernetes API server, you can run following commands:
The text was updated successfully, but these errors were encountered: