Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker in k8s #160

Closed
danjenkins opened this issue Mar 10, 2020 · 56 comments
Closed

Docker in k8s #160

danjenkins opened this issue Mar 10, 2020 · 56 comments

Comments

@danjenkins
Copy link

The container formed with the image from the repo (on dockerhub) complains about dns entries that are available not being there.

2020/03/09 13:42:52.328303 hep.go:37: ERR dial udp: lookup heplify-server on 172.20.0.2:53: no such host

Critical: cannot establish a connection
2020/03/09 13:54:33.203911 sniffer.go:123: INFO ostype: linux, osarch: amd64

Critical: cannot establish a connection

2020/03/09 13:54:33.222102 hep.go:37: ERR dial udp: lookup heplify-server.staging.svc.cluster.local: no such host

I've brought up the k8s dnsutils pod kubectl apply -f https://k8s.io/examples/admin/dns/dnsutils.yaml and that proves that the dns entries are available but heplify complains. I can only summize that due to how go applications are compiled it's not accessing the global dns available in this case.

Not really sure where to go from here in debugging this - any help would be greatly appreciated.

startup log output is

2020/03/09 13:54:33.203327 sniffer.go:114: INFO config.Config{Iface:(*config.InterfacesConfig)(0xc000152c00), Logging:(*logp.Logging)(0xc0000aa0f0), Mode:"SIPRTCP", Dedup:false, Filter:"", Discard:"", DiscardMethod:"", Zip:false, HepServer:"heplify-server.staging.svc.cluster.local:9060", HepNodePW:"", HepNodeID:0x7d2, HepNodeName:"", Network:"udp", Protobuf:false, Reassembly:false, Version:false}
2020/03/09 13:54:33.203442 sniffer.go:115: INFO &config.InterfacesConfig{Device:"any", Type:"pcap", ReadFile:"", WriteFile:"", RotationTime:60, PortRange:"5060-5090", WithVlan:false, WithErspan:false, Snaplen:8192, BufferSizeMb:32, ReadSpeed:false, OneAtATime:false, Loop:1}
2020/03/09 13:54:33.203706 sniffer.go:116: INFO bpf: (tcp or sctp) and greater 42 and portrange 5060-5090 or (udp and greater 128 and portrange 5060-5090 or ip[6:2] & 0x1fff != 0 or ip6[6]=44) or (ip and ip[6] & 0x2 = 0 and ip[6:2] & 0x1fff = 0 and udp and udp[8] & 0xc0 = 0x80 and udp[9] >= 0xc8 && udp[9] <= 0xcc)
2020/03/09 13:54:33.203911 sniffer.go:123: INFO ostype: linux, osarch: amd64
@negbie
Copy link
Member

negbie commented Mar 10, 2020

Hi @danjenkins, I had something like this in the past on a customer cluster with a different application written in Go and this solved it for me:
golang/go#29594
golang/go#29661

But it's good possible that you issue is related to how that docker image is build (alpine+static)

@danjenkins
Copy link
Author

Thanks @negbie ! I'll go and see if i can change the resolv.conf to add use-vc and single-request settings.... you think that should just be a default in the image?

@danjenkins
Copy link
Author

oh it cant be a default in the image due to how its all inherited...

@negbie
Copy link
Member

negbie commented Mar 10, 2020

@danjenkins exactly we need to find a better way.

@negbie
Copy link
Member

negbie commented Mar 10, 2020

@danjenkins something else. Does the behaviour change when you use TCP with heplify -nt flag? Like -nt tcp or -nt tls. Make sure to configure HEPTCPAddr or HEPTLSAddr in the heplify-server container. I would suggest to use TLS anyway.

@danjenkins
Copy link
Author

danjenkins commented Mar 10, 2020

that won't change the host resolution though will it? I guess its using a different part of go etc.... I didnt want to add the overhead of TLS because its all contained within a k8s cluster.

@negbie
Copy link
Member

negbie commented Mar 10, 2020

@danjenkins it shouldn't but Go's netstack has a lot of black magic behind the scenes so who knows ;)

@danjenkins
Copy link
Author

So i added

template:
    spec:
      dnsConfig:
        options:
        - name: use-vc  # specifies to local dns resolver to use tcp over udp.  udp is flakey in containers
        - name: single-request-reopen
        - name: single-request

to the manifest for the dpeloyment and thats had no affect.

Going to try doing a postrun change of resolv.conf

@negbie
Copy link
Member

negbie commented Mar 10, 2020

@danjenkins too bad! Ok let me loop in @lmangani since he controls the sipcapture repo on dockerhub and the automatic builders. I would suggest to build an alpine static image which is tiny and a bigger one with the standard Go image as builder.

@danjenkins
Copy link
Author

Just tried

          lifecycle:
            postStart:
              exec:
                command:
                - /bin/sh
                - -c
                - "/bin/echo 'options single-request-reopen' >> /etc/resolv.conf" 

and that also appears to fail. annoyingly because the run command fails I can't exec into the container

@danjenkins
Copy link
Author

Just trying out sending via tcp instead, i doubt that'll have an affect but worth a go

@danjenkins
Copy link
Author

yup - 2020/03/10 10:50:55.181243 hep.go:37: ERR dial tcp: lookup heplify-server.staging.svc.cluster.local: no such host

@negbie
Copy link
Member

negbie commented Mar 10, 2020

@danjenkins ok I think we need more image options for the user to choose from.

@danjenkins
Copy link
Author

yeah, at this point i'd be fine with an ubuntu full blown base if it worked :D

@lmangani
Copy link
Member

@negbie afaik all the images are Alpine based nowadays already. For heplify we use the included Dockerfile but I'm happy to create and push any OS variant to facilitate this testing

@lmangani
Copy link
Member

@danjenkins we can make that happen - let's elect the next OS container to build and I'll take care of it

@danjenkins
Copy link
Author

@lmangani is it easy to build one with buster or stretch https://hub.docker.com/layers/golang/library/golang/buster/images/sha256-944405641f9fb0f322be1dfc4685b916df2de3df54525cf80822f8a0529f636f?context=explore and just push it to dockerhub with a test tag, then i can test it fast

@negbie
Copy link
Member

negbie commented Mar 10, 2020

Ubuntu images are quite small too.

@danjenkins
Copy link
Author

yeah @negbie if we go with buster or stretch then it means its just a change of tag from the golang repo so theoretically no real change to your dockerfile etc etc

@negbie
Copy link
Member

negbie commented Mar 10, 2020

@danjenkins makes sense let's go the frictionless way.

@danjenkins
Copy link
Author

so i just made a debian buster version (of course requires more changes because youre not using debian)

But now i have

2020/03/10 12:58:08.837177 hep.go:37: ERR dial tcp: lookup heplify-server.staging.svc.cluster.local: device or resource busy

@danjenkins
Copy link
Author

Seems like go deals with the full .local dns differently.... trying the non full k8s url

segmentio/kafka-go#285

@danjenkins
Copy link
Author

If i go back to shortened dns it still errors using buster.

2020/03/10 13:02:50.697378 hep.go:37: ERR dial tcp: lookup heplify-server on 172.20.0.2:53: no such host

@negbie
Copy link
Member

negbie commented Mar 10, 2020

Thanks for helping out @danjenkins. Did you try to remove the build flags here?

RUN CGO_ENABLED=1 GOOS=linux go build -a --ldflags '-linkmode external -extldflags "-static -s -w"' -o heplify .

and just use something like

RUN CGO_ENABLED=1 GOOS=linux go build -o heplify .

@danjenkins
Copy link
Author

That was going to be my next question... I'm not knowledgable with Go - let me give that a go

@danjenkins
Copy link
Author

Error from the container now

standard_init_linux.go:190: exec user process caused "no such file or directory"

@danjenkins
Copy link
Author

Docker file I used was

FROM golang:buster as builder
RUN apt-get update
RUN apt-get install apt-utils musl-dev gcc libpcap-dev ca-certificates git build-essential -y
COPY . /heplify
WORKDIR /heplify
RUN CGO_ENABLED=1 GOOS=linux go build -o heplify .

FROM scratch
COPY --from=builder /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/ca-certificates.crt
COPY --from=builder /heplify/heplify .
CMD ["./heplify", "-h"]

@negbie
Copy link
Member

negbie commented Mar 10, 2020

@danjenkins I'm not sure if @lmangani uses this as build script but for me this docker file seems wrong. Whats about this:

FROM golang:buster as builder
RUN apt-get update
RUN apt-get install apt-utils gcc libpcap-dev ca-certificates git build-essential -y
RUN go get -d -v -u github.com/negbie/heplify
WORKDIR /root/go/src/github.com/negbie/heplify/
RUN CGO_ENABLED=1 GOOS=linux go build -o heplify .

FROM scratch
COPY --from=builder /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/ca-certificates.crt
COPY --from=builder root/go/src/github.com/negbie/heplify/heplify .
CMD ["./heplify", "-h"]

@negbie
Copy link
Member

negbie commented Mar 10, 2020

Can't test it currently so take it with care.

@danjenkins
Copy link
Author

@negbie being very lazy here... it errored

Step 6/10 : RUN CGO_ENABLED=1 GOOS=linux go build -o heplify .
 ---> Running in e35a5ef1828b
can't load package: package .: no Go files in /root/go/src/github.com/negbie/heplify
The command '/bin/sh -c CGO_ENABLED=1 GOOS=linux go build -o heplify .' returned a non-zero code: 1

@danjenkins
Copy link
Author

ah! missing a /

@danjenkins
Copy link
Author

oh no that wasn;t it...

@lmangani
Copy link
Member

lmangani commented Mar 10, 2020

This builds

RUN apt-get install apt-utils gcc libpcap-dev ca-certificates git build-essential -y
RUN go get -d -v -u github.com/negbie/heplify
WORKDIR /go/src/github.com/negbie/heplify/
RUN CGO_ENABLED=1 GOOS=linux go build -o heplify .

FROM scratch
COPY --from=builder /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/ca-certificates.crt
COPY --from=builder /go/src/github.com/negbie/heplify/heplify .
CMD ["./heplify", "-h"]

Untested and pushed for your leisure: sicapture/heplify:buster

@danjenkins
Copy link
Author

thanks @lmangani !!

@danjenkins
Copy link
Author

@lmangani that gives me the same error?

standard_init_linux.go:190: exec user process caused "no such file or directory"

@negbie
Copy link
Member

negbie commented Mar 10, 2020

@danjenkins Currently hoping from meeting to meeting so I will come back to you this evening when I'm at home and can verfiy stuff I paste here ;)

@danjenkins
Copy link
Author

No problem :)

@lmangani
Copy link
Member

lmangani commented Mar 10, 2020

@danjenkins how can i replicate this? (nevermind, I can, testing a fix)

@lmangani
Copy link
Member

lmangani commented Mar 10, 2020

@danjenkins try again please

FROM golang:buster as builder
RUN apt-get update
RUN apt-get install apt-utils gcc libpcap-dev ca-certificates git build-essential -y
RUN go get -d -v -u github.com/negbie/heplify
WORKDIR /go/src/github.com/negbie/heplify/
RUN CGO_ENABLED=1 GOOS=linux go build -a --ldflags '-linkmode external -extldflags "-static -s -w"' -o heplify .
RUN chmod +x /go/src/github.com/negbie/heplify/heplify


FROM scratch
COPY --from=builder /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/ca-certificates.crt
COPY --from=builder /go/src/github.com/negbie/heplify/heplify /heplify
CMD ["/heplify", "-h"]

@danjenkins
Copy link
Author

thanks! trying it out now

@danjenkins
Copy link
Author

@lmangani :(

Now I have less useful logs....

Critical: cannot establish a connection

@negbie
Copy link
Member

negbie commented Mar 10, 2020

@danjenkins in @lmangani build script I still see

RUN CGO_ENABLED=1 GOOS=linux go build -a --ldflags '-linkmode external -extldflags "-static -s -w"' -o heplify .

maybe this should be changed first.

@danjenkins
Copy link
Author

Sorry I'm not following what you're suggesting @negbie - been looking at this too long :D

@lmangani
Copy link
Member

@negbie without the error above appears, with the full options it works (apparently)
@danjenkins what config are you passing it?

@danjenkins
Copy link
Author

- name: heplify
          image: danjenkins/heplify:latest
          command:
            - "./heplify"
            - "-nt"
            - "tcp"
            - "-hs"
            - "heplify-server:9060"

@lmangani
Copy link
Member

By default, heplify-server listens on 9060/UDP unless you configured the HEPTCPAddr setting, have you?

@negbie
Copy link
Member

negbie commented Mar 10, 2020

@negbie without the error above appears, with the full options it works (apparently)
@danjenkins what config are you passing it?

@lmangani since some dns issues with Go apps on Kubernetes are due to how they are compiled I want to make sure that no further flags are provided so instead of

RUN CGO_ENABLED=1 GOOS=linux go build -a --ldflags '-linkmode external -extldflags "-static -s -w"' -o heplify .

use just

RUN CGO_ENABLED=1 GOOS=linux go build -o heplify .

@danjenkins
Copy link
Author

moving back to udp didnt solve the issue.... how do i get the rest of the logging back? :S

@danjenkins
Copy link
Author

@negbie I tried with your RUN command and I got standard_init_linux.go:190: exec user process caused "no such file or directory" error from the container.

I'm kinda stuck now and this is holding up a deployment - this is for a kamailio k8s deployment so im thinking about just enabling kamailio do send the data instead... but really didnt want to do that - any ideas?

@negbie
Copy link
Member

negbie commented Mar 11, 2020

Hi @danjenkins Im sure we can fix this but I need at least 30min spare time to look into it. Will try to get them today.

@danjenkins
Copy link
Author

@negbie I'm not sure if the issue is with you now. I'd hold off for now!

@negbie
Copy link
Member

negbie commented Mar 11, 2020

@danjenkins I used this Dockerfile and tried to do as less clever things as possible:

FROM ubuntu:18.04 as builder
RUN apt-get update
RUN apt-get install -y software-properties-common
RUN add-apt-repository ppa:longsleep/golang-backports
RUN apt-get update
RUN apt-get install -y golang-go
RUN apt install -y gcc libpcap-dev linux-headers-generic git
RUN go get -d -v -u github.com/negbie/heplify
WORKDIR /root/go/src/github.com/negbie/heplify/
RUN go build -o heplify .
RUN chmod +x /root/go/src/github.com/negbie/heplify/heplify

FROM ubuntu:18.04
ARG IPV6_SUPPORT=true
RUN apt-get update
RUN apt install -y libpcap-dev
COPY --from=builder root/go/src/github.com/negbie/heplify/heplify .
RUN ls -al heplify
CMD ["./heplify", "-h"]

Of course this means that the image isn't the smallest one but I hope it has no DNS issues. I pushed this to dockerhub under negbie/heplify:latest so if you trust me you can test this or build it quickly yourself.

@danjenkins
Copy link
Author

thanks @negbie ! I think this is a specific pod issue on k8s and wasnt an issue of yours at all :( sorry for the noise! I'll leave this issue open while I continue debugging :)

@negbie
Copy link
Member

negbie commented Mar 11, 2020

@danjenkins ok no problem. If you find the root cause it would be great if you could post the pod issue here aswell. I'm always interested in all kind of kamailio k8s deployments.

@danjenkins
Copy link
Author

Yup it was k8s. So in this example the pod was set to hostNetwork = true. And so doesnt get access to the cluster DNS. you have to set dnsPolicy: ClusterFirstWithHostNet for it to be able to have access

@negbie
Copy link
Member

negbie commented Mar 12, 2020

@danjenkins it's always k8s :D Thanks for giving the solutions aswell!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants