Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build server produces dynamic linked executable #175

Closed
joshix opened this issue Jul 2, 2015 · 33 comments
Closed

Build server produces dynamic linked executable #175

joshix opened this issue Jul 2, 2015 · 33 comments
Labels
CI/CD 🔩 Automated tests, releases

Comments

@joshix
Copy link

joshix commented Jul 2, 2015

This URI:
https://caddyserver.com/download/build?os=linux&arch=amd64&features=
That is, caddy core, no git, linux x64 -- delivers a dynamically linked executable file.

linux x86 - static
linux x64 - dynamic
freebsd x64 - static
linux arm - 50x err

and those are the archs I can speak with any knowledge about and/or inspect.

@mholt indicates in conversation this departure from the usual & delicious go static binaries is not intentional.

@joshix
Copy link
Author

joshix commented Jul 2, 2015

A go1.4.2 toolchain on debian 8 x64 also produces a dynamically linked executable. This change in output is recent.

j@bca297cfcf52:/go/src/github.com/mholt/caddy# git status
On branch master
Your branch is up-to-date with 'origin/master'.
nothing to commit, working directory clean
j@bca297cfcf52:
/go/src/github.com/mholt/caddy# go build
j@bca297cfcf52:/go/src/github.com/mholt/caddy# file caddy
caddy: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, not stripped
j@bca297cfcf52:
/go/src/github.com/mholt/caddy# ldd caddy
linux-vdso.so.1 (0x00007ffd24d86000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f98eb9f3000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f98eb64a000)
/lib64/ld-linux-x86-64.so.2 (0x00007f98ebc10000)

@mholt
Copy link
Member

mholt commented Jul 2, 2015

It's a bug in the build process.

Been doing some digging since our conversation on Twitter. Apparently this has been kind of a mess between Go 1.3 and Go 1.5 (not yet released). See golang/go#9369 and golang/go#9344 especially.

Turns out this command, with Go 1.4, builds static binaries:

go build -tags netgo -installsuffix netgo

Looks like this will be changing again in Go 1.5 later this year... but I'm still not sure exactly how.

Anyway, I was able to build a static binary this way but I haven't tested it for correctness. Apparently some important things can break, like the x509 package or the net package, if it doesn't have the shared libraries (??). I need to look into this but the more who do, the better - please feel free to try if you're up to running that build command. Try it with the tls directive and basically see if you can get it to break due to being statically compiled.

More info (some is outdated): https://groups.google.com/forum/#!topic/golang-nuts/Rw89bnhPBUI and https://groups.google.com/forum/#!topic/golang-nuts/dXyhD-XiW50 and https://groups.google.com/forum/#!msg/golang-dev/ajODMUyOhPI/l3Kh-hW3x5AJ

@joshix
Copy link
Author

joshix commented Jul 2, 2015

Ah, yes. I remember this now -- 1.4 broke what you'd do with net importers under 1.3 (and static linking has been kind of a mess on linux for always. It was the source of many wasted hours of kvetching on 9fans back in another lifetime.) Anyway I could have saved us some indirection if my memory were better -- I formerly had env vars setup for the netgo external build tagging and I forgot all about them. However, I always needed to combine the -tags dance with --ldflags options. I'm stepping through an attempt at building statically and testing this somewhat carefully now. We'll see if I learn anything.

It is probably worth noting here (in hopes there will be none) that whatever bugs we find in a statically-linked linux x64 caddy will effect all of the caddy docker images I've seen, namely mine and @abiosoft 's. Our binaries are those from github/releases, which are statically linked.
(Why that linux x64 binary is statically linked is at least as interesting of a question as why the caddyserver.com/download executable is not, if as @mholt reported both are just running "... go build nothing fancy")

And also yes, the gomobile stuff for go1.5 has to dynamically link things for the build process enforced on especially android so this is set to become even less simple.

@abiosoft
Copy link

abiosoft commented Jul 2, 2015

one of the reasons I based my docker image on alpine is to bypass the dynamic linking issue. I once had that issue with a scratch image.

@joshix
Copy link
Author

joshix commented Jul 2, 2015

@abiosoft I understand your choice. However, in or outside of a container, due to both my product and my prejudices, I do not want any go program to be dynamically linked. Truly I don't want dynamic linking of any program whatsoever, but this broader indictment isn't intended to be flame bait -- dynamic linking is such a deeply held assumption that I don't think developers are often exposed to the arguments or, especially before golang, that there even was an argument to be made.

Statically-linked programs still share code in the VM system with page mapping (provided you have an MMU), make distribution control much easier across architectures and levels of hardware-, OS-, and/or API- emulation/virtualization, cost essentially nothing at today's prices for RAM and disk, and completely elide a huge amount of complexity and security failures in the (always-privileged) linker layer of the operating system.

Think of how bad this problem is on win32 -- all the versioning ends up sharing very little library code between 3rd-party applications using the "same" dll, but you do get a lot of complexity and subtle breakages for free.

So maybe that is a very long way of saying it, but I don't want to effectively curate a minor linux distribution just to run an httpd. Currently, you don't need that distro, either, because you are shipping a statically-linked caddy in your docker image (as of your push today, v0.7.2 of caddy). Whichever way, static or dynamic, is really right for Caddy, I think the outcome of this bug should be knowing for sure how we're linking the binary, and linking it that way at any and all of the project's distribution points.

@mholt
Copy link
Member

mholt commented Jul 2, 2015

I just noticed that you reported a 500 error when downloading for ARM Linux. I've found the cause:

go tool: no such tool "5g"

Will fix that today.

@joshix
Copy link
Author

joshix commented Jul 2, 2015

Yeah that was a second bug to this but I had already used so much bandwidth with my commentary on dynamic linking. I hope that didn't come off as a scold, but there is a lot of history leading to golang that, in short, relates very directly to why that compiler is called '5g'.

-J

On Jul 2, 2015, at 10:24, Matt Holt [email protected] wrote:

I just noticed that you reported a 500 error when downloading for ARM Linux. I've found the cause:

go tool: no such tool "5g"

Will fix that today.


Reply to this email directly or view it on GitHub.

@mholt
Copy link
Member

mholt commented Jul 2, 2015

I fixed the 5g issue - builds should be working again. Next I'll look into compiling without cgo, see if anything breaks. If not, we're good to go and I can update the build server. Anyone else who wants to put it through the wringer is more than welcome... in fact it would be good to have many people testing it.

@mholt mholt added the CI/CD 🔩 Automated tests, releases label Jul 2, 2015
@joshix
Copy link
Author

joshix commented Jul 2, 2015

I have not yet produced any problems that happen only in a static exe, fwiw.

Catch me up -- where did with or without cgo come in?

@mholt
Copy link
Member

mholt commented Jul 2, 2015

Sorry, to explain better: The reason some builds are not static is because there is some use of cgo in the standard library, which I wasn't aware of. So the net package, x509 package, and os/user package (although Caddy doesn't use that one, at least directly) cause a dynamic exe.

Good to hear you haven't seen any problems yet. I haven't either.

@joshix
Copy link
Author

joshix commented Jul 2, 2015

Ok right, roger, I get it. I.e., that thing we've been talking about this whole time.

Thanks for all your time and discussion on this overnight and this morning, man. Beyond caddy, I learned/remembered things about the build chain I half-knew but hadn't fully percolated up out of my lizard brain until we arrived here.

I actually think this set of unexpectednesses is going to get simpler for go users in 1.5 even tho it becomes more complex in the tools, if I read rsc right on several CRs.

-J

On Jul 2, 2015, at 12:18, Matt Holt [email protected] wrote:

Sorry, to explain better: The reason some builds are not static is because there is some use of cgo in the standard library, which I wasn't aware of. So the net package, x509 package, and os/user package (although Caddy doesn't use that one, at least directly) cause a dynamic exe.

Good to hear you haven't seen any problems yet. I haven't either.


Reply to this email directly or view it on GitHub.

@mholt
Copy link
Member

mholt commented Jul 2, 2015

This package README says:

Cross compiled Go binaries are not suitable for production applications because code in the standard library relies on Cgo for DNS resolution with the native resolver, access to system certificate roots, and parts of os/user.

However, I haven't yet seen any issues with DNS resolution or certificate access problems. Have you been able to get those to come out of hiding? (For example, I'm binding to a hostname "matt.dev" which is in my hosts file, so it should have to do a DNS lookup, right?)

By the way, thanks very much for helping with this.

@joshix
Copy link
Author

joshix commented Jul 2, 2015

It wouldn't do a DNS lookup, actually, because the resolver hits /etc/hosts in precedence first. I haven't seen a problem from resolution, tho, and I've tested against labs_nn_.joshix.com which (was) a DNS hostname and not in a hosts file. So I think that's ok. System roots -- I have to think about that.

(If I were a better developer (person? ;) ), I'd be pushing some kind of tests at you instead of just pulling the old "oh yeah, I played with it" routine.) Being thorough here is actually a deeper set of issues than I knew. I'm going to see if I can find others talking about specific bugs encountered that are attributable to this "not ready for production" business.

@joshix
Copy link
Author

joshix commented Jul 4, 2015

On the arm/5g build: Executable now exists. I fetched from caddyserver/downloads. Core only. It's a static executable. Stage one of that secondary distribution bug resolved. Tested (more on how below) and it LGTM.

Testing:
I haven't quite thought my way to formalizing the "method" I've been using to user-test this stuff (which you've correctly labeled a deployment issue) into test code we might check in. So here FWIW is the skeleton of what I do:

  1. download the project-dist binary from caddyserver/downloads
  2. compare to my own build on given arch (I don't cross compile, so far -- I build natively on the platforms I'm claiming to have tested)
  3. in linux x64 case, I diff the headers and response body for char-for-char differences in output returned to identical requests, dynamic (downloaded) -vs-static (build -tags netgo [...]) binary.
  4. watch the certificate exchange at the browser side, examine page info/cert info, and check cert access at caddy side with log stderr. If I knew as much about TLS as @coolaj86 who filed default HTTPS / TLS for localhost development and local sharing #143, I'd trust my results on this point more. I'm just making openssl fake certs like I always have and so the system roots access part, I guess I could be failing to test.
  5. contrive 3 URIs and beat hell out of them with github/rakyll/boom for good measure.
  6. make up DNS names both existent and non-existent for the testing host to bump the resolver.
  7. I still wish I knew more exactly what I was looking for -- I ack the comment in the inconshrevable/gonative readme, but wish there was a specific example, e.g., "name resolution fails in this manner if you cross-compile...".

That said, all these archs LGTM with static binaries, whether I build them or your build server does. I don't see bugs that exist only in the static builds.
linux x64
linux arm v7
freebsd x64 (fewer repetitions)
darwin x64 (much less thoroughly, TBH)

What does your test procedure for a user- (as opposed to function-level unit-), test of these potential issues look like, @abiosoft & @mholt?

@abiosoft
Copy link

abiosoft commented Jul 5, 2015

I just realised my base docker image alpine:3.1 cannot execute dynamic binaries :(.

So far, I've also had 0 issues with static binary.

@joshix
Copy link
Author

joshix commented Jul 5, 2015

@abiosoft Out of curiosity, do you know why the dynamic executable fails there? Is the alpine base missing a library(s) entirely, or does it have the lib(s) but in a version that doesn't resolve all symbols?

-J

On Jul 5, 2015, at 08:10, Abiola Ibrahim [email protected] wrote:

I just realised my base docker image alpine:3.1 cannot execute dynamic binaries :(.

So far, I have also had 0 issues with static binary.


Reply to this email directly or view it on GitHub.

@abiosoft
Copy link

abiosoft commented Jul 5, 2015

@joshix I think its missing them. I did some search and I installed glibc. No success so far and I don't know what the last line means. Any ideas ?

$ ldd caddy
    /lib64/ld-linux-x86-64.so.2 (0x7ffbf4714000)
    libpthread.so.0 => /lib64/ld-linux-x86-64.so.2 (0x7ffbf4714000)
    libc.so.6 => /lib64/ld-linux-x86-64.so.2 (0x7ffbf4714000)
Error relocating caddy: __vfprintf_chk: symbol not found

Attempting to run it always gives

$ ./caddy
sh: ./caddy: not found

@joshix
Copy link
Author

joshix commented Jul 5, 2015

@abiosoft That last line means that the system on which you executed lld caddy has a libc that is missing the __vfprintf_chk symbol that the dynamic linker expects to find when it attempts to resolve it at runtime for the caddy executable.

The system that built that caddy executable has some version of libc. Call that system the "build environment". At compile time in the build environment, the linker wrote a list of libc symbols into the executable, but did not write their actual txt segments. Like pointers, sort of.
Call the place where you want to run that caddy executable the "deployment environment". When you invoke caddy in the deployment environment, the dynamic linker reads whatever nearest-match libc it can find, and attempts to resolve the symbols from the executable into the actual code in the library. So this deployment environment has a libc, but that libc doesn't have the same symbols and/or at the same addresses as the libc found in the build environment. You end up with an incomplete executable when the dynamic linker tries to relocate it and its library components into a runnable image in memory. ldd and other tools tell you the symbol(s) that aren't right. Sometimes this means you can't run the executable at all, as is happening to you. Other times the problems are more subtle and fun and the executable will run, but then fail on some less immediate branch in the code later. Those failures are the worst, because you may not notice them until a customer tries to run your dynamically-linked program on a system with an even weirder or older or newer libc.

I'm sure it's pretty obvious that my idea for this for our specific situation is to take advantage of one of golang's most simplifying abstractions, and always ship a statically-linked binary. This allows you the almost-magical power of actually shipping the actual code you actually built and actually tested.

Your other option is to discover the exact libc against which a dynamically-linked caddy has been built, and then somehow assure that your deployment environment(s) always have an absolutely compatible libc available, and that that libc and only that libc is the one the dynamic linker matches in its search paths and uses for symbol resolution to run your program. This option is more difficult to implement. Imagine how it scales when your program links dozens or hundreds of libraries dynamically.

One big motivation many devs have for using container systems like Docker is trying to solve this very problem: They no longer ship just their dynamically-linked binaries and hope customers have the correct library in the correct version. Instead they ship a container that includes the right library in the right version. I call this "poor-man's static linking". There are many good reasons for containers, but I think this particular reason is lousy -- for go developers, because we can link statically -- in fact in the simple case our tools always static link and don't even ask us about it. Static linking, by simply including the library code directly in the executable, does this job better and more simply than a container, wherein someone is curating a miniature linux distro just to get all the libs right.
Guys shipping big C++ programs may not even have a practical option to statically-link their binaries, and so it is good that containers offer this way of trying to control library balkanization. We as go developers are luckier than them. We can build statically, not worry about deployment environment libraries at all, and use containers as isolation primitives, or to support multiple tenants, or any of the other good reasons.

Sorry my comments are always so long. I would make them shorter if I had one or both of more intelligence or more time. :)

@abiosoft
Copy link

abiosoft commented Jul 6, 2015

@joshix Thanks for the explanation. Who says your comments are long ? They are much helpful. 💯
@mholt Let's configure the server generating the build to produce static binaries. I think that's our best bet now.

@mholt
Copy link
Member

mholt commented Jul 6, 2015

@joshix Thanks for the education - I've been learning a lot about static compilation/linking lately.

Today I will be trialling locally with a build server that produces only static binaries.

@mholt mholt added the in progress 🏃‍♂️ Being actively worked on label Jul 6, 2015
@mholt
Copy link
Member

mholt commented Jul 13, 2015

I got static binaries building with the build server in a local VM, but I'm also experimenting with gb to make builds more stable, pending a bug fix. Stay tuned... hopefully will do a release this week either way.

@mholt mholt removed the in progress 🏃‍♂️ Being actively worked on label Jul 15, 2015
@mholt
Copy link
Member

mholt commented Jul 15, 2015

@joshix Would you give it a try now on the Caddy website? I deployed the changes and recompiled the standard library using CGO_ENABLED=0. Using that same env variable during builds produces static binaries (at least, with the current version of Go and from my own testing) - would you confirm and then feel free to close if fixed?

@joshix
Copy link
Author

joshix commented Jul 15, 2015

@mholt Linux x64 is still a dynamically linked executable at caddyserver.com/download/.

I grabbed caddy, no git selected, via browser download from caddyserver/download just now.

j@idealx:~$ file caddy_*/caddy
caddy_linux_amd64_custom/caddy: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, not stripped

The linux arm and x86 binaries are statically linked as expected. I haven't checked the github download for 0.7.3 yet. The github/mholt/caddy/releases/0.7.3 linux x64 binary is statically-linked.

@mholt
Copy link
Member

mholt commented Jul 15, 2015

Argh, I tried that same thing and it was static.... looking into it.

@mholt
Copy link
Member

mholt commented Jul 15, 2015

@joshix Okay, I don't know why, but I had to run CGO_ENABLED=0 ./make.bash --no-clean (which I did already, minus the --no-clean, before running make.bash for all the other platforms). I can't explain why but that seemed to fix it.

Or I'm going crazy. Please confirm. (That it's static now, not that I'm crazy.)

@joshix
Copy link
Author

joshix commented Jul 15, 2015

@mholt LGTM now for Linux x64 binary from caddyserver/download:

j@idealx:~$ file caddy_*/caddy
caddy_linux_amd64_custom/caddy: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, not stripped
j@idealx:~$ ./caddy_linux_amd64_custom/caddy -version
Caddy 0.7.3

@mholt
Copy link
Member

mholt commented Jul 15, 2015

👍 I've updated my setup script. Consider this case closed.

Thanks for the report, patience, and efforts!

@mholt mholt closed this as completed Jul 15, 2015
@joshix
Copy link
Author

joshix commented Aug 2, 2015

https://caddyserver.com/download:

j@maxton:~/caddy-dist$ date -u
Sun Aug  2 23:29:44 UTC 2015
j@maxton:~/caddy-dist$ curl -o caddy_linux_amd64_custom.tar.gz 'https://caddyserver.com/download/build?os=linux&arch=amd64&features='
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
  [...]
j@maxton:~/caddy-dist$ tar xzf caddy_linux_amd64_custom.tar.gz
j@maxton:~/caddy-dist$ ls
caddy  caddy_linux_amd64_custom.tar.gz  CHANGES.txt  LICENSES.txt  README.txt
j@maxton:~/caddy-dist$ sha1sum caddy
a746fc6a1761df8a7d9548932c7efae6e8884a80  caddy
j@maxton:~/caddy-dist$ file caddy
caddy: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, not stripped
j@maxton:~/caddy-dist$ objdump -p ./caddy |grep NEEDED
  NEEDED               libpthread.so.0
  NEEDED               libc.so.6

This should perhaps be filed against caddyserver/buildsrv, but this issue already exists, so reopening here.

@mholt mholt reopened this Aug 3, 2015
@mholt
Copy link
Member

mholt commented Aug 3, 2015

Ugh. How did this happen. I didn't change anything with the build process or the Go installation.

Looking into it...

@mholt
Copy link
Member

mholt commented Aug 3, 2015

Re-built the Go standard library. I see static executables now. Please try again and let me know.

@joshix
Copy link
Author

joshix commented Aug 3, 2015

I still get a dynamic linux amd64 binary from https://caddyserver.com/download/build?os=linux&arch=amd64&features=

In fact the exe has the same sha1 (a746fc6a1761df8a7d9548932c7efae6e8884a80) as I reported last back.

Cached somewhere?

When you “see static executables,” where are you looking that’s different from where I’m fetching binaries? I’ve let the buildsrv stuff remain pretty opaque to me so I’m not sure what the process looks like, nor how much trouble the effort to force-produce static binaries adds to it.

On Aug 2, 2015, at 20:12, Matt Holt [email protected] wrote:

Re-built the Go standard library. I see static executables now. Please try again and let me know.


Reply to this email directly or view it on GitHub #175 (comment).

@mholt
Copy link
Member

mholt commented Aug 3, 2015

Oops, you're right. I forgot to clear the cache. * facepalm *

Sorry for all the trouble about this. I'm considering programming a system test that will verify that the output is a static binary and have it run on a regular basis. Hopefully that will help pinpoint what causes them to turn into dynamic executables.

@mholt mholt closed this as completed Aug 3, 2015
@joshix
Copy link
Author

joshix commented Aug 3, 2015

Got it that time.

$ file caddy_linux_amd64_custom/caddy 
caddy_linux_amd64_custom/caddy: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, not stripped

On Aug 2, 2015, at 22:17, Matt Holt [email protected] wrote:

Oops, you're right. I forgot to clear the cache. * facepalm *

Sorry for all the trouble about this. I'm considering programming a system test that will verify that the output is a static binary and have it run on a regular basis. Hopefully that will help pinpoint what causes them to turn into dynamic executables.


Reply to this email directly or view it on GitHub #175 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI/CD 🔩 Automated tests, releases
Projects
None yet
Development

No branches or pull requests

3 participants