Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is hairpin-mode required for kube-dns? #316

Closed
t3hmrman opened this issue Feb 20, 2018 · 9 comments
Closed

Is hairpin-mode required for kube-dns? #316

t3hmrman opened this issue Feb 20, 2018 · 9 comments

Comments

@t3hmrman
Copy link

t3hmrman commented Feb 20, 2018

Hey all, thanks for the work on kube-router!

I apologize if this is a simple mistake, but am I right in assumign that kube-router does not replace kube-dns? I'm trying to get kube-dns to work on top of kube-router (ClusterIP <-> ClusterIP connections work just fine), but I'm having a problem getting names to actually resolve.

As far as why this might be happening, I'm thinking that this log line from sidecar inside the kube-dns pod might be helpful?

W0220 20:24:30.247394       1 server.go:64] Error getting metrics from dnsmasq: read udp 127.0.0.1:58863->127.0.0.1:53: read: connection refused

I also thought maybe it might be related to the alpine bug regarding resolv.conf, but people are at least getting IPs returned there, I get:

/ #  nslookup kubernetes.default
nslookup: can't resolve '(null)': Name does not resolve

nslookup: can't resolve 'kubernetes.default': Try again

If I force the ServiceIP of the DNS service:

/ #  nslookup kubernetes.default 10.32.0.2
Server:    10.32.0.2
Address 1: 10.32.0.2

nslookup: can't resolve 'kubernetes.default': Try again

The Direct IP of the pod doesn't fare any better. I know the service has endpoints, and I know the Service is reachable (I'm also running the kubernetes dashboard and I can curl to it by service IP).

What I thought I might be missing was --hairpin-mode=true, since the error message from the sidecar container says that it's trying to access 127.0.0.1:53, but that hasn't worked either... Does anyone know what I might be missing?

If I try to dig the ClusterIP of the dns pod from inside an alpine container:

/ # dig +add +trace  @10.244.1.53 kubernetes-dashboard

; <<>> DiG 9.10.2 <<>> +add +trace @10.244.1.53 kubernetes-dashboard
; (1 server found)
;; global options: +cmd
;; connection timed out; no servers could be reached
/ # dig  @10.244.1.53 kubernetes-dashboard

; <<>> DiG 9.10.2 <<>> @10.244.1.53 kubernetes-dashboard
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 39233
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;kubernetes-dashboard.          IN      A

;; Query time: 0 msec
;; SERVER: 10.244.1.53#53(10.244.1.53)
;; WHEN: Tue Feb 20 20:45:59 UTC 2018
;; MSG SIZE  rcvd: 49

NOTE After a machine restart I no longer get the probe error from sidecar, so that is no longer an issue I think, but DNS still doesn't work so maybe this question is more general now

@t3hmrman t3hmrman changed the title Is hairpin-mode required for kube-dns Is hairpin-mode required for kube-dns? Feb 20, 2018
@murali-reddy
Copy link
Member

@t3hmrman So you are not able to get functioning kube-dns service? There is no need for hairpin mode. Any pod should be able to access cluster ip of DNS service.

even if you delete the pod you still have issue with DNS resolution?

@t3hmrman
Copy link
Author

t3hmrman commented Feb 22, 2018

Hey @murali-reddy yeah that's exactly what's happening. Good to know hairpin mode is not required -- it must be something wrong with the resource definition I'm using for kube-dns.

I will try deleting the pod again -- it's especially weird that I can acess the other kube-dashboard resource by it's service IP but not the dns...

[EDIT] - I'm starting to think this is an issue with containers inside the pod communicating with each other -- kube-dns has the usual 3 internal containers (dnsmasq, sidecar and kubedns), but I think they're expected to be exposed @ 127.0.0.1 by the base configuration (https://github.com/kubernetes/kubernetes/blob/master/cluster/addons/dns/kube-dns.yaml.base), and maybe that's not true inside a set up by kube-router? Is pod-internal traffic under the purview of kube-router?

Starting to think this isn't a kube-router problem at all, but just the way these containers were wired?

@t3hmrman
Copy link
Author

t3hmrman commented Feb 22, 2018

An update -- this very likely is not the fault of kube-router, after lots more investigation, pod internal traffic is working just fine, it's kubedns that is actually broken. I think my specific setup (arch + systemd-orchestrated api/controller-manager/scheduler) is what's causing this interop issue.

Here's a snippet of the logs (kubedns container inside the kube-dns service's pod, after setting v=3 on the kubedns container's args) that I'm currently investigating:

I0222 08:12:55.749225       1 dns.go:584] Query for "kubernetes.default.svc.cluster.local.", exact: false <--- every once in a while
I0222 08:12:55.749262       1 dns.go:704] Found 1 records for [local cluster svc default kubernetes] in the cache <---- every once in a while
I0222 08:12:57.616885       1 dns.go:436] No service for endpoint "kube-scheduler" in namespace "kube-system" <--- constantly
I0222 08:12:57.616919       1 dns.go:436] No service for endpoint "kube-scheduler" in namespace "kube-system" 
I0222 08:12:57.632853       1 dns.go:436] No service for endpoint "kube-controller-manager" in namespace "kube-system" <--- constantly
I0222 08:12:57.632865       1 dns.go:436] No service for endpoint "kube-controller-manager" in namespace "kube-system"

It may very well be the case that kube-dns just expects to access the kube-scheduler and kube-controller-manager from services?

@murali-reddy
Copy link
Member

murali-reddy commented Feb 22, 2018

@t3hmrman did you get a chance to try CoreDNS instead of kube-dns?

Is pod-internal traffic under the purview of kube-router?

No. Kube-router only deals with pod-to-pod connectivity and services.

@t3hmrman
Copy link
Author

Hey @murali-reddy, thanks for taking the time to help -- my next step is to try CoreDNS next, absolutely.

I didn't want to give up, since the solution seems so close :) I'm taking notes as I go so I can make a blog post about my floundering, and at least file a ticket with kube-dns so they know about it.

What I've done lately is use a headless service to expose the services:

kind: Service
apiVersion: v1
metadata:
  name: kube-scheduler
  namespace: kube-system
spec:
  type: ExternalName
  externalName: <node ip>

kind: Service
apiVersion: v1
metadata:
  name: kube-controller-manager
  namespace: kube-system
spec:
  type: ExternalName
  externalName: <node ip>

@t3hmrman
Copy link
Author

t3hmrman commented Feb 22, 2018

Hey @murali-reddy I managed to narrow it down some more -- it looks like actually the traffic going to Service IPs are not being routed properly.

If I try to get the kubernetes dashboard by it's fully qualified name:

/ # dig @10.32.0.2 kubernetes-dashboard.kube-system.svc.cluster.local
<hangs>
^C

Now, if I try the same exact request, but with the pod IP for kube-dns:

/ # dig @10.244.1.65 -p53 kubernetes-dashboard.kube-system.svc.cluster.local

; <<>> DiG 9.11.2-P1 <<>> @10.244.1.65 kubernetes-dashboard.kube-system.svc.cluster.local
; (1 server found)
;; global options: +cmd
;; Got answer:
;; WARNING: .local is reserved for Multicast DNS
;; You are currently testing what happens when an mDNS query is leaked to DNS
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 48325
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;kubernetes-dashboard.kube-system.svc.cluster.local. IN A

;; ANSWER SECTION:
kubernetes-dashboard.kube-system.svc.cluster.local. 30 IN A 10.32.0.40

;; Query time: 0 msec
;; SERVER: 10.244.1.65#53(10.244.1.65)
;; WHEN: Thu Feb 22 10:41:42 UTC 2018
;; MSG SIZE  rcvd: 84

I doubt the external name service stuff I did was necessary, but I'll file an issue with kube-dns to find out anyway.

This issue does seem to be related to kube-router, because it's a pod <-> service endpoint issue -- I'm going to start re-reading the docs on kube-router service configuration, I must have set something up wrong

[EDIT] - Current theory is that I don't have IPVS installed properly, since that is what's supposed to help kube-router function -- looking into it...

[EDIT2] - Nope, IPVS is installed..., I see routes when I check ipvsadm

@t3hmrman
Copy link
Author

Found it! the issue was ufw. I use UFW internally and it was stopping DNS traffic from moving around internally on the node!

I apologize for the wasted cycles, I just need to update my rules to allow traffic between the locally created networks!

@murali-reddy
Copy link
Member

@t3hmrman great you could figure out.

@t3hmrman
Copy link
Author

t3hmrman commented Feb 22, 2018

For posterity the fix was:

$ sudo ufw allow to 10.32.0.0/16 from 10.244.0.0/16
$ sudo ufw allow from 10.32.0.0/16 to 10.244.0.0/16

To connect my service cluster to the regular pods to interact

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants