Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net: NATs client can't connect to server when client built with go1.7: "dial tcp: no suitable address found" #16739

Closed
tomwilkie opened this issue Aug 16, 2016 · 13 comments

Comments

@tomwilkie
Copy link
Contributor

Please answer these questions before submitting your issue. Thanks!

  1. What version of Go are you using (go version)?

go1.7rc6 from docker hub:

$ docker run -ti golang:1.7 /bin/sh
# go version
go version go1.7rc6 linux/amd64
  1. What operating system and processor architecture are you using (go env)?

Minikube - x86 VirtualBox VM on a Mac

# go env
GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/go"
GORACE=""
GOROOT="/usr/local/go"
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
CC="gcc"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build828064041=/tmp/go-build -gno-record-gcc-switches"
CXX="g++"
CGO_ENABLED="1"
  1. What did you do?

Compiled Weave Scope with go1.7 (see weaveworks/scope#1797), ran it on a minikube instance and it couldn't connect to a NATs instance:

$ kubectl logs --namespace=scope query-1106217792-nv4ar 
<app> INFO: 2016/08/16 14:04:32.999799 app starting, version 919c3be, ID 357f1fafb9b6ed6f
<app> INFO: 2016/08/16 14:04:33.000030 command line args: --app.collector=dynamodb://abc:[email protected]:8000/reports --app.collector.s3=s3://abc:[email protected]:4569/s3 --app.http.address=:80 --app.log.http=true --app.memcached.hostname=memcached.scope.svc.cluster.local --app.memcached.service=memcached --app.memcached.timeout=100ms --app.nats=nats://nats.scope.svc.cluster.local:4222 --app.userid.header=X-Scope-OrgID --logtostderr=true --mode=app --weave=false
<app> INFO: 2016/08/16 14:04:33.000514 Connecting nats to nats://nats.scope.svc.cluster.local:4222
2016/08/16 14:04:33 nats err: dial tcp: no suitable address found
<app> FATA: 2016/08/16 14:04:33.000747 Error creating collector: nats: no servers available for connection

The "nats: no servers available for connection" comes form the NATS client, which hides the original error. the "dial tcp: no suitable address found" is me adding some logging to the client to show the error.

  1. What did you expect to see?

Kubectl exec'ing into a pod on the machine shows NATs is indeed accessible:

$ kubectl exec -ti --namespace=scope pipe-4267260430-ci7f0 /bin/sh
/home/weave # nslookup nats.scope.svc.cluster.local

Name:      nats.scope.svc.cluster.local
Address 1: 10.0.0.132

# telnet nats.scope.svc.cluster.local 4222
INFO {"server_id":"T452ED9wLfSbSr3lPntysO","version":"0.8.0","go":"go1.6.2","host":"0.0.0.0","port":4222,"auth_required":false,"ssl_required":false,"tls_required":false,"tls_verify":false,"max_payload":1048576} 

-ERR 'Unknown Protocol Operation'
-ERR 'Parser Error'
Connection closed by foreign host

#  apk add drill
(1/2) Installing ldns (1.6.17-r3)
(2/2) Installing drill (1.6.17-r3)
Executing busybox-1.24.2-r0.trigger
OK: 25 MiB in 45 packages
/home/weave # drill  nats.scope.svc.cluster.local
;; ->>HEADER<<- opcode: QUERY, rcode: NOERROR, id: 53508
;; flags: qr aa rd ra ; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0 
;; QUESTION SECTION:
;; nats.scope.svc.cluster.local.    IN  A

;; ANSWER SECTION:
nats.scope.svc.cluster.local.   30  IN  A   10.0.0.132

;; AUTHORITY SECTION:

;; ADDITIONAL SECTION:

;; Query time: 1 msec
;; SERVER: 10.0.0.10
;; WHEN: Tue Aug 16 14:14:29 2016
;; MSG SIZE  rcvd: 62
/home/weave # exit
  1. What did you see instead?

I expect it to connect and work, as it does with go1.6

@bradfitz
Copy link
Contributor

What does running with environment variable GODEBUG=netdns=1 say?

I suspect your Go 1.6 vs Go 1.7 differ in how they were built (cgo/netgo/etc), because I don't think anything changed in this area during Go 1.7.

/cc @mdempsky

@bradfitz bradfitz added the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Aug 16, 2016
@tomwilkie
Copy link
Contributor Author

I suspect your Go 1.6 vs Go 1.7 differ in how they were built (cgo/netgo/etc), because I don't think anything changed in this area during Go 1.7.

Possibly; although the only change I made was to use a different image (1.7 vs 1.6.2).

What does running with environment variable GODEBUG=netdns=1 say?

Will have a go now. If this doesn't help, I can try and provide a minimal reproduction tomorrow.

@tomwilkie
Copy link
Contributor Author

What does running with environment variable GODEBUG=netdns=1 say?

With go1.7rc6:

$ kubectl logs --namespace=scope query-846243711-ptwwy 
<app> INFO: 2016/08/16 17:06:25.596993 app starting, version 919c3be, ID 336f06fd3b5d2bf3
<app> INFO: 2016/08/16 17:06:25.597260 command line args: --app.collector=dynamodb://abc:[email protected]:8000/reports --app.collector.s3=s3://abc:[email protected]:4569/s3 --app.http.address=:80 --app.log.http=true --app.memcached.hostname=memcached.scope.svc.cluster.local --app.memcached.service=memcached --app.memcached.timeout=100ms --app.nats=nats://nats.scope.svc.cluster.local:4222 --app.userid.header=X-Scope-OrgID --logtostderr=true --mode=app --weave=false
go package net: built with netgo build tag; using Go's DNS resolver
<app> INFO: 2016/08/16 17:06:25.597506 Connecting nats to nats://nats.scope.svc.cluster.local:4222
2016/08/16 17:06:25 nats: nats://nats.scope.svc.cluster.local:4222
2016/08/16 17:06:25 foo:  nats.scope.svc.cluster.local:4222
2016/08/16 17:06:25 nats err: dial tcp: no suitable address found
<app> FATA: 2016/08/16 17:06:25.597841 Error creating collector: nats: no servers available for connection

@bradfitz
Copy link
Contributor

Can you compare that with Go 1.6?

For your Go 1.7 output, I see:

go package net: built with netgo build tag; using Go's DNS resolver

I think the relevant change in Go 1.7 was 72c1180 (net: don't do DNS for onion and local addresses) for #13705.

When @mikioh mentioned RFC 6762, which says:

   This document specifies that the DNS top-level domain ".local." is a
   special domain with special semantics, namely that any fully
   qualified name ending in ".local." is link-local, and names within
   this domain are meaningful only on the link where they originate.
   This is analogous to IPv4 addresses in the 169.254/16 prefix or IPv6
   addresses in the FE80::/10 prefix, which are link-local and
   meaningful only on the link where they originate.

   Any DNS query for a name ending with ".local." MUST be sent to the
   mDNS IPv4 link-local multicast address 224.0.0.251 (or its IPv6
   equivalent FF02::FB).  The design rationale for using a fixed
   multicast address instead of selecting from a range of multicast
   addresses using a hash function is discussed in Appendix B.
   Implementers MAY choose to look up such names concurrently via other
   mechanisms (e.g., Unicast DNS) and coalesce the results in some
   fashion.  Implementers choosing to do this should be aware of the
   potential for user confusion when a given name can produce different
   results depending on external network conditions (such as, but not
   limited to, which name lookup mechanism responds faster).

I missed this part:

Implementers MAY choose to look up such names concurrently via other
mechanisms (e.g., Unicast DNS) and coalesce the results in some
fashion.

And currently we're just always skipping DNS for *.local addresses:

// avoidDNS reports whether this is a hostname for which we should not                                      
// use DNS. Currently this includes only .onion and .local names,                                           
// per RFC 7686 and RFC 6762, respectively. See golang.org/issue/13705.                                     
func avoidDNS(name string) bool {  
        if name == "" {  
                return true  
        }  
        if name[len(name)-1] == '.' {  
                name = name[:len(name)-1]  
        }  
        return stringsHasSuffixFold(name, ".onion") || stringsHasSuffixFold(name, ".local")  
}  

We should probably relax the *.local case, at least for the netgo case, but maybe in all cases.

I doubt it's relevant, but what is your /etc/resolv.conf and /etc/nsswitch.conf?

/cc @mikioh @mdempsky @ianlancetaylor for any opinions and whether this is Go 1.7.1 worthy.

@zquestz
Copy link

zquestz commented Aug 16, 2016

I am definitely for relaxing the *.local case. We use DNS for resolving .local domains and this is a blocker for us to go to Go 1.7. We actually have a local DNS server setup specifically for this case, which is local to the machine, and not propagated to upstream DNS servers.

@tomwilkie
Copy link
Contributor Author

Thanks for looking into this!

Relaxing the .local sounds like a plan, as things like kubedns and weavedns depend heavily on it.

@tomwilkie
Copy link
Contributor Author

@bradfitz I put together a quick CL to relax the .local case: https://go-review.googlesource.com/#/c/27250/

@tomwilkie
Copy link
Contributor Author

Sorry just realised there were some more outstanding questions:

Can you compare that with Go 1.6?

$ kubectl logs --namespace=scope query-846243711-d3dhl 
<app> INFO: 2016/08/17 09:49:14.140623 app starting, version b3b160c, ID 24bcbcba11b7f7d3
<app> INFO: 2016/08/17 09:49:14.141131 command line args: --app.collector=dynamodb://abc:[email protected]:8000/reports --app.collector.s3=s3://abc:[email protected]:4569/s3 --app.http.address=:80 --app.log.http=true --app.memcached.hostname=memcached.scope.svc.cluster.local --app.memcached.service=memcached --app.memcached.timeout=100ms --app.nats=nats://nats.scope.svc.cluster.local:4222 --app.userid.header=X-Scope-OrgID --logtostderr=true --mode=app --weave=false
go package net: built with netgo build tag; using Go's DNS resolver
<app> INFO: 2016/08/17 09:49:14.146056 Connecting nats to nats://nats.scope.svc.cluster.local:4222
2016/08/17 09:49:14 nats: nats://nats.scope.svc.cluster.local:4222
2016/08/17 09:49:14 foo:  nats.scope.svc.cluster.local:4222
<app> INFO: 2016/08/17 09:49:14.152539 listening on :80

I doubt it's relevant, but what is your /etc/resolv.conf and /etc/nsswitch.conf?

These jobs are running inside an Alpine container:

/home/weave # cat /etc/resolv.conf 
search scope.svc.cluster.local svc.cluster.local cluster.local
nameserver 10.0.0.10
options ndots:5
/home/weave # cat /etc/nsswitch.conf
cat: can't open '/etc/nsswitch.conf': No such file or directory

@gopherbot
Copy link
Contributor

CL https://golang.org/cl/27250 mentions this issue.

@bradfitz bradfitz removed the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Aug 17, 2016
@bradfitz bradfitz added this to the Go1.7.1 milestone Aug 17, 2016
@tomwilkie
Copy link
Contributor Author

tomwilkie commented Aug 17, 2016

Can we consider this for 1.7.1? I see the milestone has been set. Thanks!

@mikioh mikioh changed the title NATs client can't connect to server when client built with go1.7: "dial tcp: no suitable address found" net: NATs client can't connect to server when client built with go1.7: "dial tcp: no suitable address found" Aug 21, 2016
@mikioh
Copy link
Contributor

mikioh commented Aug 21, 2016

Alas, but as mentioned in https://tools.ietf.org/html/draft-adpkja-dnsop-special-names-problem, it's better to stay away from the disturbance of TLD usage.

gopherbot pushed a commit that referenced this issue Sep 7, 2016
.local addresses are used by things like Kubernetes and Weave DNS; Go
should not avoid resolving them.

This is a partial revert of https://golang.org/cl/21328 which was too
strict of an interpretation of RFC 6762.

Fixes #16739

Change-Id: I349415b4eab5d61240dd18217bd95dc7d2105cd5
Reviewed-on: https://go-review.googlesource.com/27250
Reviewed-by: Brad Fitzpatrick <[email protected]>
Run-TryBot: Brad Fitzpatrick <[email protected]>
TryBot-Result: Gobot Gobot <[email protected]>
Reviewed-on: https://go-review.googlesource.com/28632
caarlos0 added a commit to ContaAzul/hystrix-to-librato that referenced this issue Sep 22, 2016
@mcandre
Copy link

mcandre commented Feb 23, 2017

cGo DNS resolution of .local hostnames is still broken in Go 1.7.5, in Kubernetes scenarios.

Working around this by configuring environment variable GODEBUG with value netdns=go.

@bradfitz
Copy link
Contributor

@mcandre, if you have a bug report, please file a new bug. We don't track closed issues. I'm not sure what you're saying, though. I don't know what you mean by "broken", or what a Kubernetes scenario means.

wrouesnel pushed a commit to prometheus-community/postgres_exporter that referenced this issue Apr 12, 2017
I am interested in the postgres_exporter releases built with any Go > 1.7.0 since that should fix the issue golang/go#16739 in Go.
wrouesnel pushed a commit to prometheus-community/postgres_exporter that referenced this issue Apr 13, 2017
@golang golang locked and limited conversation to collaborators Feb 24, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants