Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[eks] [issue]: Windows Pods not able to resolve internal k8s services #236

Closed
JasonChinsen opened this issue Apr 3, 2019 · 19 comments
Closed
Labels
Proposed Community submitted issue

Comments

@JasonChinsen
Copy link

Tell us about your request
I am running a mixed k8s cluster based on eks-windows-preview

I have deployed mongodb on Linux nodes and I am not able to resolve the service from within the Windows Pod.

Which service(s) is this request for?
EKS

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?
service:

$kubectl get services | grep -i mongodb
mongodb                               ClusterIP      10.100.87.180    <none>                                                                    27017/TCP                               2d

Windows:

$kubectl exec -ti windows-server-iis-7dcfc7c79b-zm4dh -- powershell
...
PS C:\ProgramData\chocolatey\lib\curl\tools\curl-7.64.1-win64-mingw\bin> .\curl.exe mongodb:27017
curl: (6) Could not resolve host: mongodb

Linux:

$kubectl exec -it nginx-deployment-7f74dd57b8-9xdkg sh
...
# bash
root@nginx-deployment-7f74dd57b8-9xdkg:/# curl mongodb:27017
It looks like you are trying to access MongoDB over HTTP on the native driver port.

Are you currently working around this issue?
How are you currently solving this problem?

Additional context
Anything else we should know?

Attachments
If you think you might have additional information that you'd like to include via an attachment, please do - we'll take a look. (Remember to remove any personally-identifiable information.)

@JasonChinsen JasonChinsen added the Proposed Community submitted issue label Apr 3, 2019
@CarmenAPuccio
Copy link

CarmenAPuccio commented Apr 3, 2019

@JasonChinsen can you do me a favor and run the following commands from within one of the Windows Pods?

kubectl exec -ti windows-server-iis-7dcfc7c79b-pqt6q -- powershell

From within the pod (It will probably fail)

nslookup kubernetes.default

Then try this. You should most likely have an empty SuffixSearchList

Get-DnsClientGlobalSetting 

Then try running the below to set the SuffixSearch List (this is what's in my /etc/resolv.conf in the pods of the Linux nodes)

Set-DnsClientGlobalSetting -SuffixSearchList @("default.svc.cluster.local", "svc.cluster.local", "cluster.local", "us-west-2.compute.internal")

Does the nslookup now succeed to mongo and to kubernetes.default? This isn't a fix but merely a troubleshooting step.

@JasonChinsen
Copy link
Author

@CarmenAPuccio, thank you for the quick response, it does look like there was an issue with the DnsClientGlobalSetting. Is there a way to set this correctly when launching windows pods?

Server:  kube-dns.kube-system.svc.cluster.local
Address:  10.100.0.10

*** kube-dns.kube-system.svc.cluster.local can't find kubernetes.default: Non-existent domain
PS C:\> Get-DnsClientGlobalSetting

UseSuffixSearchList : False
SuffixSearchList    : {}
UseDevolution       : True
DevolutionLevel     : 0
PS C:\> Set-DnsClientGlobalSetting -SuffixSearchList @("default.svc.cluster.local", "svc.cluster.local", "cluster.local", "us-west-2.compute.internal")
PS C:\> nslookup
Default Server:  kube-dns.kube-system.svc.cluster.local
Address:  10.100.0.10

>

PS C:\ProgramData\chocolatey\lib\curl\tools\curl-7.64.1-win64-mingw\bin> nslookup mongodb
Server:  kube-dns.kube-system.svc.cluster.local
Address:  10.100.0.10

Name:    mongodb.default.svc.cluster.local
Address:  10.100.87.180
...

PS C:\ProgramData\chocolatey\lib\curl\tools\curl-7.64.1-win64-mingw\bin> .\curl.exe mongodb:27017
curl: (7) Failed to connect to mongodb port 27017: Timed out

@CarmenAPuccio
Copy link

@JasonChinsen Let me chat with the teams internally and we'll get back to you. Just out of curiosity, what is your cluster version?

@JasonChinsen
Copy link
Author

Hello @CarmenAPuccio

kubectl version
Client Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.2", GitCommit:"cff46ab41ff0bb44d8584413b598ad8360ec1def", GitTreeState:"clean", BuildDate:"2019-01-10T23:35:51Z", GoVersion:"go1.11.4", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"11+", GitVersion:"v1.11.8-eks-7c34c0", GitCommit:"7c34c0d2f2d0f11f397d55a46945193a0e22d8f3", GitTreeState:"clean", BuildDate:"2019-03-01T22:49:39Z", GoVersion:"go1.10.8", Compiler:"gc", Platform:"linux/amd64"}

both Windows and Linux nodes are running are on version v1.11.5

@somujay
Copy link

somujay commented Apr 4, 2019

Hello @CarmenAPuccio

kubectl version
Client Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.2", GitCommit:"cff46ab41ff0bb44d8584413b598ad8360ec1def", GitTreeState:"clean", BuildDate:"2019-01-10T23:35:51Z", GoVersion:"go1.11.4", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"11+", GitVersion:"v1.11.8-eks-7c34c0", GitCommit:"7c34c0d2f2d0f11f397d55a46945193a0e22d8f3", GitTreeState:"clean", BuildDate:"2019-03-01T22:49:39Z", GoVersion:"go1.10.8", Compiler:"gc", Platform:"linux/amd64"}

both Windows and Linux nodes are running are on version v1.11.5

@JasonChinsen / @CarmenAPuccio , thanks for pointing this out, while generating CNI config, DNS suffixes are left out. In the next AMI release we'll fix it. For now, you can manually edit the config in your windows nodes. Here are the steps

  1. CNI Config location on Windows host "C:\ProgramData\Amazon\EKS\cni\config\vpc-shared-eni.conf"
  2. Add DNS suffix list (see below for example.)
    "domain": "default.svc.cluster.local"
  3. Delete the currently running pod. The new one should have this setting. If it still doesn't get the setting, restart your windows node (There are other ways, but restart is simpler).

These steps will set the suffix list automatically when new pods are created.

{
"cniVersion": "0.3.1",
"name": "vpc",
"type": "vpc-shared-eni",
"eniMACAddress": "xx:xx:xx:xx:xx:xx",
"eniIPAddress": "xxx.xxx.xxx.xx/xx",
"gatewayIPAddress": "xxx.xxx.xx.x",
"vpcCIDRs": ["xxx.xxx.x.x/xx"],
"serviceCIDR": "xx.xxx.x.x/xx",
"dns": {
"nameservers": ["xx.xxx.x.xx"],
"domain": "default.svc.cluster.local"
}
}

@JasonChinsen
Copy link
Author

Thank you!

@kernelcoredump
Copy link

Any timelime on the availability of the updated AMIs? We're currently working around this using a startup PowerShell script chained through ENTRYPOINT.

@pdefreitas
Copy link

@somujay any eta for a new AMI that does not have this issue?

@jg-par
Copy link

jg-par commented Aug 29, 2019

Hello, I've been trying to add Windows nodes to my existing Linux EKS cluster and it seems that no matter what I do I can't get the pods to resolve DNS, or even find the default DNS server (kubedns/172.20.0.10). I've tried everything listed in this forum to no avail. I've tried changing the DNS to the VPC-default servers, which allows for external connectivity, but this doesn't allow me to access pods in the cluster. Does anyone have any ideas?

@somujay
Copy link

somujay commented Sep 6, 2019

Hello, I've been trying to add Windows nodes to my existing Linux EKS cluster and it seems that no matter what I do I can't get the pods to resolve DNS, or even find the default DNS server (kubedns/172.20.0.10). I've tried everything listed in this forum to no avail. I've tried changing the DNS to the VPC-default servers, which allows for external connectivity, but this doesn't allow me to access pods in the cluster. Does anyone have any ideas?

Hi Joe,
Sorry for the delayed response. Since you've added windows worker nodes to existing linux cluster, did you make sure that communication between linux and windows worker nodes are enabled in inbound rules? Please verify the communication between them are working. Also send me your VPC CIDR range and your CNI Config content.

Thanks!
Somu.

@jg-par
Copy link

jg-par commented Sep 10, 2019

Hey @somujay,
Sorry for the delay. The nodes themselves can indeed talk to each other, and the VPC CIDRs are 10.20.0.0/16 and 10.25.0.0/16, though all EKS resources are located in the 10.25 block. Below is the CNI config from the Windows node. I know the VPC CIDR value doesn't reflect where the resources are actually located, but I've found that changing this value does not help.
{ "cniVersion": "0.3.1", "name": "vpc", "type": "vpc-shared-eni", "eniMACAddress": "06:d6:ea:c8:40:4a", "eniIPAddress": "10.25.5.83/25", "gatewayIPAddress": "10.25.5.1", "vpcCIDRs": ["10.20.0.0/16"], "serviceCIDR": "172.20.0.0/16", "dns": { "nameservers": ["172.20.0.10"] } }

@somujay
Copy link

somujay commented Sep 16, 2019

Restart-ser

Hey @somujay,
Sorry for the delay. The nodes themselves can indeed talk to each other, and the VPC CIDRs are 10.20.0.0/16 and 10.25.0.0/16, though all EKS resources are located in the 10.25 block. Below is the CNI config from the Windows node. I know the VPC CIDR value doesn't reflect where the resources are actually located, but I've found that changing this value does not help.
{ "cniVersion": "0.3.1", "name": "vpc", "type": "vpc-shared-eni", "eniMACAddress": "06:d6:ea:c8:40:4a", "eniIPAddress": "10.25.5.83/25", "gatewayIPAddress": "10.25.5.1", "vpcCIDRs": ["10.20.0.0/16"], "serviceCIDR": "172.20.0.0/16", "dns": { "nameservers": ["172.20.0.10"] } }

Synced with Joe. it was aws-auth configmap wasn't applied correctly. Closing the issue.

@krishnaputhran
Copy link

I am having the same issue. Can you explain what was the aws-auth confimap issue?

@jg-par
Copy link

jg-par commented Oct 22, 2019

@krishnaputhran
I didn't apply the configmap provided in the repository, I didn't think I had to since the Windows nodes were being added to my cluster without it, but this prevented KubeDNS from working on the Windows nodes.

@krishnaputhran
Copy link

@jg-par can you please explain as which configmap you are referring? Is there any special configmaps need to be created as part of adding windows nodes to the existing linux cluster? appreciate your hekp

@Hakob
Copy link

Hakob commented Nov 4, 2020

#606 (comment)

@vkothapallirli
Copy link

@somujay @JasonChinsen @CarmenAPuccio

I have one issue where windows pods are not able to perfrom nslookup with pods names in same namespace

For example I have nginx-abc & nginx-efg deployment in namespace demo

nginx-abc pod is not able to perfrom nslookup nginx-efg on other nginx pod

Here is my C:\ProgramData\Amazon\EKS\cni\config\vpc-shared-eni.conf

{
  "cniVersion": "0.3.1",
  "name": "vpc",
  "type": "vpc-shared-eni",
  "eniMACAddress": "XX:XX:XX:XX:XX:XX",
  "eniIPAddress": "XXX.XX.XX.XXX/XX",
  "gatewayIPAddress": "XXX.XX.XX.XXX",
  "vpcCIDRs": [
    "XXX.XX.XX.XX/XX"
],
  "serviceCIDR": "XXX.XX.XX.XX/XX",
  "dns": {
    "nameservers": ["XXX.XX.XX.XX"],
    "search": [
    "{%namespace%}.svc.cluster.local",
    "svc.cluster.local",
    "cluster.local"
]
  }
}

Server Version

Server Version: version.Info{Major:"1", Minor:"21+", GitVersion:"v1.21.2-eks-0389ca3", GitCommit:"8a4e27b9d88142bbdd21b997b532eb6d493df6d2", GitTreeState:"clean", BuildDate:"2021-07-31T01:34:46Z", GoVersion:"go1.16.5", Compiler:"gc", Platform:"linux/amd64"}
NAME                                       STATUS      VERSION
ip-XX-XX-XX-XX.us-XXXX-X.compute.internal      Ready      v1.21.2-eks-55daa9d
ip-XX-XX-XX-XX.us-XXXX-X.compute.internal      Ready      v1.21.2-eks-55daa9d
ip-XX-X-X-XX.us-XXXX-X.compute.internal      Ready      v1.18.20-eks-c9f1ce
ip-XX-X-X-XX.us-XXXX-X.compute.internal    Ready        v1.21.2-eks-55daa9d
ip-X-X-X-X.us-XXXX-X.compute.internal   Ready        v1.18.20-eks-c9f1ce
ip-X-X-X-X.us-XXXX-X.compute.internal   Ready        v1.21.2-eks-55daa9d
ip-X-X-X-X.us-XXXX-X.compute.internal     Ready          v1.21.2-eks-55daa9d
ip-X-X-X-X.us-XXXX-X.compute.internal    Ready        v1.21.2-eks-55daa9d
ip-X-X-X-X.us-XXXX-X.compute.internal    Ready         v1.18.20-eks-c9f1ce
ip-X-X-X-X.us-XXXX-X.compute.internal   Ready        v1.21.2-eks-55daa9d

@gfrid
Copy link

gfrid commented Apr 18, 2022

@krishnaputhran I didn't apply the configmap provided in the repository, I didn't think I had to since the Windows nodes were being added to my cluster without it, but this prevented KubeDNS from working on the Windows nodes.

in the latest document from AWS its not stated anymore and the issue still exist with EKS 1.21 (with win nodes)
i have logged a support ticket

@tonetechnician
Copy link

tonetechnician commented Jan 26, 2023

@gfrid Did you have any word back on this?

I'm also experiencing this in EKS 1.21 and also the latest 1.24.

I've attempted the @CarmenAPuccio's (#236 (comment)) troubleshoot but it also doesn't seem to work after setting the suffix list.

The only way I've been able to get dns resolution is to explicitly set the dns server in the container to the core-dns service a (192.168.x.x) in kube-system using Set-DnsClientServerAddress -interfacealias vEthernet* -serveraddresses ("192.168.x.x,10.100.0.10")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Proposed Community submitted issue
Projects
None yet
Development

No branches or pull requests