Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can not resolve hostname in Kubernetes #6099

Closed
LeonLiuY opened this issue May 15, 2018 · 28 comments
Closed

Can not resolve hostname in Kubernetes #6099

LeonLiuY opened this issue May 15, 2018 · 28 comments

Comments

@LeonLiuY
Copy link

The TCP client can not connect to services in Kubernetes using the service hostname.

For example http://some-service or mysql://root@some-service:3306/db

The error is No address found for some-service:80 over TCP

Changing the hostname some-service to the cluster IP addr in Kubernetes makes the connections successful.

This happens both to HTTP clients and DB clients.

@LeonLiuY
Copy link
Author

I could nslookup some-service successfully to resolve the IP addr in the container.

@j8r
Copy link
Contributor

j8r commented May 15, 2018

Related to #2660

@jhass
Copy link
Member

jhass commented May 15, 2018

Duplicate of #2426 and in scope of #2660, so closing.

@jhass jhass closed this as completed May 15, 2018
@RX14
Copy link
Contributor

RX14 commented May 15, 2018

But we merged #2745 ?

@jhass
Copy link
Member

jhass commented May 15, 2018

Ah, mh I guess that should have fixed this point, we didn't revert any of that by chance?

@liuyang1204 Ruby and Python are for example able to resolve these as well?

@ysbaddaden
Copy link
Contributor

No, nothing changed. We use blocking getaddrinfo(3) calls.

@lbguilherme
Copy link
Contributor

Also happens on Docker Swarm, with the overlay network

@RX14
Copy link
Contributor

RX14 commented May 15, 2018

a strace of the getaddrinfo calls would be useful here. Especially compared to strace of tools that do resolve correctly. If getaddrinfo's broken not much crystal can do though.

@RX14 RX14 reopened this May 15, 2018
@LeonLiuY
Copy link
Author

@jhass
Yes, I'm running a microservice system and other applications written by Ruby, Python or Node JS have no problem resolving hostname.

I'm using Crystal 0.24.1 BTW.

@LeonLiuY
Copy link
Author

LeonLiuY commented May 16, 2018

Additional information:

I'm running the statically linked Crystal executable in Alpine docker image. Don't know this matters or not though.

@ysbaddaden
Copy link
Contributor

ysbaddaden commented May 16, 2018

Can you try out Socket::Addrinfo directly, with different values for service, family, type and protocol? Then report if any is resolving or not?

require "socket"
p Socket::Addrinfo.resolve("some.service", 80, type: Socket::Type::STREAM)
p Socket::Addrinfo.resolve("some.service", "http", family: Socket::Family::INET6)

Notes:

  • we enabled punnycode recently, but I don't why this would affect this case;
  • statically compiled crystal uses musl-libc but the binaries it builds uses the system libc —i.e. crystal itself may not resolve everything, but the built program should.

Maybe the service.to_s is the culprit in https://github.com/crystal-lang/crystal/blob/master/src/socket/addrinfo.cr#L106 —it should be the binary representation of the number, not the string representation?

@LeonLiuY
Copy link
Author

LeonLiuY commented May 16, 2018

The first resolve throws an exception:

No address found for some-service:80 over IP (Socket::Error)
  from usr/share/crystal/src/socket/addrinfo.cr:0:33 in '???'
  from usr/share/crystal/src/http/server/request_processor.cr:255:3 in 'process'
  from usr/share/crystal/src/fiber.cr:255:3 in '???'
  from ???

The second resolve doesn't compile:

Error in app.cr:9: instantiating 'Socket::Addrinfo:Class#resolve(String, String)'

  p Socket::Addrinfo.resolve("some.service", "http", family: Socket::Family::INET6)
                     ^~~~~~~

in /usr/local/Cellar/crystal-lang/0.24.2_1/src/socket/addrinfo.cr:28: can't restrict Nil to Type

    def self.resolve(domain, service, family : Family? = nil, type : Type = nil, protocol : Protocol = Protocol::IP, timeout = nil) : Array(Addrinfo)

@LeonLiuY
Copy link
Author

Dockerfile

FROM crystallang/crystal:latest

ADD . /src
WORKDIR /src
RUN crystal build app.cr --release --static

FROM alpine

COPY --from=0 /src/app /app

EXPOSE 8080

ENTRYPOINT ["/app"]

app.cr

require "socket"
require "http/server"

server = HTTP::Server.new("0.0.0.0", 8080) do |context|
  context.response.content_type = "text/plain"
  p Socket::Addrinfo.resolve("some-service", 80, type: Socket::Type::STREAM)
end

puts "Listening on http://0.0.0.0:8080"
server.listen

@LeonLiuY
Copy link
Author

LeonLiuY commented May 16, 2018

Confirmed that the app can resolve the hostname without static compile

FROM crystallang/crystal:latest

ADD . /src
WORKDIR /src
RUN crystal build app.cr --release # No static

FROM crystallang/crystal:latest    # Based on crystal image instead of alpine

COPY --from=0 /src/app /app

EXPOSE 8080

ENTRYPOINT ["/app"]

But then I miss the small image.

@RX14
Copy link
Contributor

RX14 commented May 16, 2018

I don't understand how a static build is working on glibc at all to be honest. I would guess it's issues with the loadable nsswitch stuff. Not really a bug on crystal's side either way.

You'll have to just forgo the static build, or make a static build on alpine (we should really provide good official tooling for doing this in docker)

@ysbaddaden
Copy link
Contributor

So, the problem is using a static glibc then run it in a non glibc environment, where it probably can't find some expected dynamic libraries (see link warnings) or some configuration. There is sadly nothing we can do, here...

A solution could be to build a dynamic executable under alpine itself (we have issues with musl+static), pulling the crystal tarball (it's statically linked) and installing packaged dependencies (pcre, libgc) on both build/run containers.

@LeonLiuY
Copy link
Author

LeonLiuY commented May 16, 2018

I didn't see any warning, and the application runs perfectly except the hostname problem.
I'd expect it fail-fast if the environment doesn't match.

Alpine is not what I want, I just want to use a tiny base image to run my Crystal app.

Any guides would be appreciated, for example:

If I make a static build using crystallang/crystal image, what is the minimum requirement to run it? Will busybox:glibc work fine? Or is there any other recommended tiny docker images that work?

Anyway really appreciate the fast responses and kind explanation.

@ysbaddaden
Copy link
Contributor

glibc is known for still dynamically load libraries at runtime when built statically. It's usually fine but sometimes it's not. I suspect getaddrinfo is one of them, it may need to load some nsswitch libs but they're nowhere to be found in a musl-libc distribution.

I'm afraid I can't help much otherwise. Building and running under the same base image seems like a sound advice, thought. Crystal is now officially packaged on alpine for example.

@ysbaddaden
Copy link
Contributor

Closing, since the issue is related to glibc being statically compiled.

@faustinoaq
Copy link
Contributor

Alpine is not what I want, I just want to use a tiny base image to run my Crystal app.

@liuyang1204 Did you try this: https://manas.tech/blog/2017/04/03/shipping-crystal-apps-in-a-small-docker-image.html ?

@LeonLiuY
Copy link
Author

@faustinoaq Thanks, yes I've checked it but thought the solution is a little bit fat. (The list deps script and copying setting) So I tried static compile approach.

I have some updates:

The build with static flag actually outputs some warnings:

 /usr/lib/gcc/x86_64-linux-gnu/5/../../../x86_64-linux-gnu/libcrypto.a(dso_dlfcn.o): In function `dlfcn_globallookup':
(.text+0x11): warning: Using 'dlopen' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
_main.o: In function `__crystal_main':
/usr/share/crystal/src/socket/tcp_server.cr:98: warning: Using 'getaddrinfo' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
/usr/lib/gcc/x86_64-linux-gnu/5/../../../x86_64-linux-gnu/libevent.a(evutil.o): In function `evutil_getaddrinfo_common':
(.text+0x1749): warning: Using 'getprotobynumber' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking

And I tried to use busybox:glibc as the base image and it works!

@LeonLiuY
Copy link
Author

LeonLiuY commented May 17, 2018

Using the solution in https://manas.tech/blog/2017/04/03/shipping-crystal-apps-in-a-small-docker-image.html

RUN ldd bin/miniserver | tr -s '[:blank:]' '\n' | grep '^/' | \
    xargs -I % sh -c 'mkdir -p $(dirname deps%); cp % deps%;'

doesn't solve the issue.

I guess I met the same problem with this comment and need to copy those 2 files.

But I'm quite happy with busybox now.

@kalinon
Copy link
Contributor

kalinon commented Jul 6, 2020

Just adding for history's sake. Was able to get alpine image to work:

FROM crystallang/crystal:0.35.1

ADD . /src
WORKDIR /src
RUN shards build --release

RUN ldd ./bin/myapp | tr -s '[:blank:]' '\n' | grep '^/' | \
  xargs -I % sh -c 'mkdir -p $(dirname deps%); cp % deps%;'

FROM alpine
COPY --from=0 /src/deps /
COPY --from=0 /src/bin /bin

COPY --from=0 /lib/x86_64-linux-gnu/libnss_dns.so.2 /lib/x86_64-linux-gnu/libnss_dns.so.2
COPY --from=0 /lib/x86_64-linux-gnu/libresolv.so.2 /lib/x86_64-linux-gnu/libresolv.so.2

WORKDIR /app

@Blacksmoke16
Copy link
Member

@kalinon There are official Alpine docker images now: https://crystal-lang.org/2020/02/02/alpine-based-docker-images.html.

@benbonnet
Copy link

@Blacksmoke16 should official alpine images resolve this issue ?

@Blacksmoke16
Copy link
Member

@benbonnet Dunno, would have to try it out.

@benbonnet
Copy link

benbonnet commented Jan 1, 2021

@Blacksmoke16 currently having the following :

FROM crystallang/crystal:latest  as compiler
RUN mkdir /app
COPY . /app

RUN mkdir /app/bin

WORKDIR /app

RUN shards --production
RUN crystal build /app/src/svc.cr --static --release --no-debug -o /app/bin/svc

FROM scratch
COPY --from=compiler /app/bin/svc .
EXPOSE 3000

ENV KEMAL_ENV=production

CMD ["./svc"]

and still getting what's described by @LeonLiuY here ( Using 'dlopen' in statically linked applications...). Does not work.

Using a FROM busybox:glibc instead of the FROM scratch do work; the image is around 6Mo so we somehow reached our 'goal'. That is, I'm not experienced enough to be sure if using the busybox:glibc is the most robust solution

@Blacksmoke16
Copy link
Member

Blacksmoke16 commented Jan 1, 2021

@benbonnet Can you not just use the alpine image mentioned earlier? Something like:

FROM crystallang/crystal:0.35.1-alpine as build

COPY shard.lock /app/shard.lock
COPY shard.yml /app/shard.yml

WORKDIR /app

RUN shards install --production

COPY src/ /app/src/

RUN shards build --release --production --no-debug --static
RUN strip ./bin/svc

FROM alpine:3.12

COPY --from=build /app/bin /

EXPOSE 3000

ENV KEMAL_ENV=production

CMD ["./svc"]

The problem is you can't statically link on Ubuntu which is what crystallang/crystal:latest is based on. You must use Alpine linux to build the static binary, which you then could move into a scratch image or just keep alpine as its quite small itself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants