Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should we use gcc from the default channel for Linux (and maybe OS X)? #29

Closed
ocefpaf opened this issue Feb 17, 2016 · 81 comments
Closed

Comments

@ocefpaf
Copy link
Member

ocefpaf commented Feb 17, 2016

Most of the time we are OK using the compilers installed in the CIs because we all have similar build tools pre-installed in our machines. However, every now and then someone tries to use the packages in a docker image without those tools. (For example ioos/conda-recipes#723 and ioos/conda-recipes#700.) A few questions:

  • Will that be fixed using the gcc from the default channel?
  • What would be the downside of that?
  • How about OS X? Are we relaying on clang or homebrew gcc? Or it does not matter?
@ChrisBarker-NOAA
Copy link
Contributor

I have no idea about most of this, but:

We should build with the same toolchain as Anaconda is built with as much as possible. Which is clang on OS-X, I think.

And the "manylnux" folks have been working on a Docker image for building manylinux wheels, which is derived from the Anaconda experience -- so that might be a good place to go for Linux:

https://github.com/pypa/manylinux

@pelson
Copy link
Member

pelson commented Feb 18, 2016

I'm pretty happy with the reach of our existing binaries. @ocefpaf - I know there is no time like the present to get this right, but I don't really have any experience of it going wrong. My hunch therefore would be to stick with what we have until we find a problem with it. 👍 / 👎?

@PythonCHB
Copy link
Contributor

👍

@ocefpaf
Copy link
Member Author

ocefpaf commented Feb 18, 2016

I'm pretty happy with the reach of our existing binaries. @ocefpaf - I
know there is no time like the present to get this right, but I don't
really have any experience of it going wrong.

Well, it is not a matter of right and wrong. I am pretty happy too. The
issue arises when people use the conda package in minimalistic docker
images.

My hunch therefore would be to stick with what we have until we find a
problem with it. / ?

+1 let's just document that people should install build_essentials and
etc.

@pelson
Copy link
Member

pelson commented Feb 18, 2016

let's just document that people should install build_essentials and

The issue arises when people use the conda package in minimalistic docker images.

Ah OK. I've not seen these. I'm happy to tighten that requirement down somewhat - it sounds like quite a big ask to install build_essentials...

@ocefpaf
Copy link
Member Author

ocefpaf commented Feb 18, 2016

My bad, I am on the phone and the # refs above should point to the ioos conda recipe repo.

@ocefpaf
Copy link
Member Author

ocefpaf commented Feb 18, 2016

Ah OK. I've not seen these. I'm happy to tighten that requirement down
somewhat - it sounds like quite a big ask to install build_essentials...

build_essentials was a lazy solution from my part. Some cases need only
libgomp, others libgfortran.

@jakirkham
Copy link
Member

In the few cases where I have had issues elsewhere, I find I can use install_name_tool or patchelf to link things to something like libgcc from conda to resolve these sorts of issue. A little inelegant I suppose, but I do like using the system compilers if I can.

@jakirkham
Copy link
Member

So, this ( conda-forge/staged-recipes#164 ) might be such a case where we would want to use conda's gcc.

@msarahan
Copy link
Member

Here's what I understand:

If you ship libgcc (more importantly, libstdc++, which comes with it) and shadow the system libstdc++, and the system libstdc++ is newer than the one you ship, you'll run into unresolved symbol errors at runtime and crash or fail to run.

This has been a huge motivator for me to get GCC 5.2 running in our docker build image.

I have argued very strongly internally against using the gcc that is in defaults. My main argument against even having this package is that people will use it on unknown platforms - and this means their packages will have an unknown version dependency on GLibC.

IMHO, Continuum should just ship all the runtimes, the same way we do with Windows. They are much more nicely backwards/forwards compatible on Linux, but I don't see harm in keeping them controlled on Linux.

@jjhelmus
Copy link
Contributor

I think an argument can also be made that you should ship no gcc, libstdc++, and similar runtimes and instead always depend on the system provided ones. This seems to be what the manylinux folks are doing with wheel files. I'm not sure which option is better but I think both should be on the table.

@jakirkham
Copy link
Member

One of the other ideas, I was playing with in that PR is bundling only a few essential components like libgfortran or libgomp from the VMs we building in. Things that may not already be included on the system, but we are (or will be) linking against. I am just worried they will get crushed when someone installs defaults' libgcc package and am unclear on if (when) that leads to bad behavior. Also, I know a little less about how these fringe components interact with libgcc, which they are all linked to.

@jakirkham
Copy link
Member

Alternatively static linkage remains a valid option here.

@jjhelmus
Copy link
Contributor

I have run into issues where a Fortran compiled extension linked against symbols in my system provided libgfortran that were not in the Anaconda provided one which caused the extension to fail to import. Using conda uninstall -f libgfortran fixed the issue but it is not ideal.

If runtimes are shipped on Linux it seems they must be the most up-to-date versions. Keeping these up to date may require significant maintenance.

@jakirkham
Copy link
Member

Yeah, I am liking the static option more and more.

@msarahan
Copy link
Member

I'm not clear on how the Manylinux stuff works to depend on libstdc++ on the system. I'm sure they have something figured out, but I just don't understand it.

This is the article that convinced me to pursue the approach I'm behind: http://www.crankuptheamps.com/blog/posts/2014/03/04/Break-The-Chains-of-Version-Dependency/

Note that this is the same approach taken by the Julia team.

@msarahan
Copy link
Member

Found it. They place tight restrictions on ABI version:

Therefore, as a consequence of requirement (b), any wheel that depends on versioned symbols from the above shared libraries may depend only on symbols with the following versions:

GLIBC <= 2.5
CXXABI <= 3.4.8
GLIBCXX <= 3.4.9
GCC <= 4.2.0

@jjhelmus
Copy link
Contributor

Yup, from my understanding they are defining a base linux system that has a set of "core" libraries which they expect to 1) exist and 2) match a minimum version. But pip does not have a effective method for providing more up-to-date runtimes like conda does.

@jjhelmus
Copy link
Contributor

I'm warming more to the idea of providing the latest runtimes. Would this allow us to compile package with the GCC 5 libstdc++ ABI and run them on systems using the GCC 4 API?

@msarahan
Copy link
Member

I feel like Conda has a better approach here, making the assumption that we should provide it. People can conceivably pip install something without having libstdc++ installed, and end up confused. My wife had that happen with Steam on her Linux computer, for example. Good times. I never thought work would be so useful at home.

FWIW, I'm pretty sure Continuum is taking this route, and you can be certain that it will be maintained as long as we're pushing it, because we'll have customers screaming otherwise.

@jjhelmus yes. Here's my understanding with GCC5:

Compiled with GCC5, CXXFLAGS="${CXXFLAGS} -D_GLIBCXX_USE_CXX11_ABI=0" => GCC4 compatible, runs fine with libstdc++ from gcc 5 (it is dual-abi). Does not link with libs compiled with GCC5 (abi 5)

Compiled with GCC5, CXXFLAGS="${CXXFLAGS} -D_GLIBCXX_USE_CXX11_ABI=1" => GCC5 compatible, runs fine with libstdc++ from gcc 5 (it is dual-abi). Does not link with libs compiled with GCC4.

Continuum is planning on the former setting for now, with a planned switch at some point in the future, along with an associated rebuild of (maybe) everything.

I have tried to make that ABI info readily accessible with startup scripts in the build docker image: https://github.com/ContinuumIO/docker-images/pull/20/files#diff-8320ce46adf2819c0900060bd6c14c43R16

(also see the start_c++??.sh scripts, which are meant to be simple front-ends)

@jakirkham
Copy link
Member

Continuum is planning on the former setting for now...

Alright, this clarifies the Linux stuff for me.

...with a planned switch at some point in the future, along with an associated rebuild of (maybe) everything.

That's going to be fun. Hopefully, conda-forge has everything and is super fast then. 😄

I have tried to make that ABI info readily accessible with startup scripts in the build docker image...

Thanks. This is really useful.

@jjhelmus
Copy link
Contributor

I'm on board too. Thanks for the great explanation @msarahan. It took a bit but I'm seeing the light. Of course now I'm going to have to build GCC 5 tonight.

Sorry for the long tangent on this PR @jakirkham, did this answer your original concern?

@jakirkham
Copy link
Member

This is what I am still unclear about, are we shipping gcc on Mac too?

@msarahan
Copy link
Member

My current opinion is yes. I'd like to avoid it if possible, but I see the need for OpenMP and Fortran. I'll keep you all in the loop on any discussions we have here.

@jakirkham
Copy link
Member

Ok. With OpenMP, maybe we can get around it by doing something similar to the Linux strategy namely building the newest clang on our oldest Mac (10.7). Though Fortran remains a different problem.

@msarahan
Copy link
Member

Thanks for being receptive, both of you! Now let's go rule the world! (or maybe just build great software)

@jakirkham
Copy link
Member

Thanks for keeping us in the loop.

Now let's go rule the world! (or maybe just build great software)

Are they mutually exclusive? 😈

@jakirkham
Copy link
Member

I'm working on a gcc-6-on-centos-6 docker image at the moment, which I'll put up once I can get it into a shape where it can build all of Julia successfully (which depends on openblas).

-- quote from @tkelman


That sounds really cool, @tkelman. Thanks for sharing.

Interesting. Yeah, I think we are staying on CentOS 6 for present, but it is possible if we find it pressing enough that we would go back to CentOS 5. The current thought is that without more pressing reasons (people clamoring for that level of GLIBC compatibility) we will stay on CentOS 6.

There was a docker container that used CentOS 5 and gcc 5.2 that @msarahan had proposed. Though there are some concerns like having to rebuild everything on the old CentOS. Also there is an issue due to CentOS 5 being less than a year from EOL. There were some other concerns about dependencies, which are kind of up in the air (CUDA, cuDNN, etc.).

I know devtoolsets do interesting things when linking libraries so that things remain portable without needing to package libgcc. Though I kind of like this feature. It seems like you were running into issues with it though. Could you please explain? Do you have any thoughts on setting this up in your docker container?

Having a newer gcc sounds nice. However, I'm not sure what breaks there are in gcc 6 and am only aware of the breaks present in gcc 5. Do you know anything about this?

How have you been building this image? Is this (and I know this is a long shot) being built on Docker Hub, Quay, or similar? One challenging aspect here has been having a shared infrastructure to do a build on an image like this. We want to avoid a developer bandwidth problem.

Personally, I would be really interested in being able to share a common framework with Julia (maybe even packages 😉). So would really love to discuss this more with you.

@tkelman
Copy link
Member

tkelman commented Apr 29, 2016

The devtoolset does things in a funny way where it is set up to statically link newer pieces of libstdc++ and libgfortran that might not exist on the default centos system compiler versions. We initially tried to use the devtoolset for Julia, but found when building openblas with the devtoolset the openblas shared library doesn't actually end up statically linked to libgfortran. So there's still a dependency on libgfortran which we have to bundle in our binaries, but we don't want to use the system centos libgfortran version as that's too old. So we transitioned to doing something very similar to what the Conda folks are now doing, building our own GCC 5.x from source on CentOS 5. It was ansible based and hooked up to buildbot and I'm now updating/re-doing that in Docker form with 6.x versions.

GCC 6 does break a fair amount of code. I'm mainly looking at it as slight future-proofing since Arch and unstable versions of Fedora and openSUSE are likely to upgrade to GCC 6 soon. Due to the glibc issue described in detail by @njsmith here https://sourceware.org/bugzilla/show_bug.cgi?id=19884, "generic linux binaries" need to be built on the oldest glibc version of any system a user wants to use (so old centos/rhel drives this), with as new or newer gcc version as any user has installed as their default system compiler version (so arch/fedora/non-LTS-ubuntu drives this).

I'll see whether docker hub's time limit is capable of handling this. I've only used quay a handful of times and haven't hooked it up to github hooks yet (which is really convenient when working with docker hub auto builds) but in manual quay builds it did seem way faster than docker hub.

@jakirkham
Copy link
Member

...there's still a dependency on libgfortran which we have to bundle in our binaries

Correct. We are aware of this. What do you do to solve this?

Wasn't sure if there were other weird things you noticed.

...but we don't want to use the system centos libgfortran version as that's too old.

If it has all been built with a new gfortran, why does one care about this?

GCC 6 does break a fair amount of code.

Do you have any examples.

the oldest glibc version of any system a user wants to use (so old centos/rhel drives this), with as new or newer gcc version as any user has installed as their default system compiler version

I see so it is just the mad race to stay newer while supplying old GLIBC support. That makes sense.

I'll see whether docker hub's time limit is capable of handling this. I've only used quay a handful of times and haven't hooked it up to github hooks yet (which is really convenient when working with docker hub auto builds) but in manual quay builds it did seem way faster than docker hub.

Would be interesting to see what you discover.

Yeah, I've had so many issues with Docker Hub that I might just want to use quay if for no other reason than it is a little bit more stable.

@jakirkham
Copy link
Member

Also, there is a similar story with OpenMP as with Fortran when using devtoolsets, if you haven't encountered that yet.

@tkelman
Copy link
Member

tkelman commented Apr 29, 2016

Some of the linking to system libgfortran with devtoolset might be resolvable with openblas makefile patches to remove things like hardcoded -lgfortran, but we didn't look too far into that.

If it has all been built with a new gfortran, why does one care about this?

For the standalone Julia binaries to work on systems that might not have libgfortran installed, we need to bundle a libgfortran. The devtoolset doesn't include its own separate modern shared-library version of libgfortran. But when you build gcc from source in the normal way, you will get a shared libgfortran that you can use just fine. So that's what we do. If any users try building C/C++/Fortran libraries with newer compiler versions than what we used to build Julia, they'll need to delete or rename the runtime libraries that we bundle in the Julia binaries in order to call into them from Julia.

Do you have any examples.

JuliaLang/julia#14829 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69550

Here's my WIP so far: https://github.com/tkelman/c6g6/blob/master/Dockerfile

@njsmith
Copy link

njsmith commented Apr 29, 2016

@tkelman: I'm curious whetherv you've considered handling the system-vs-shipped gfortran issue the same way we are for manylinux builds, by renaming it to avoid triggering that glibc bug.

@tkelman
Copy link
Member

tkelman commented Apr 29, 2016

Considering we've had to deal with blas symbol name collisions for some time even from differently named libraries, I'm not sure changing the library file name alone without also renaming all the symbols would fix matters.

@njsmith
Copy link

njsmith commented Apr 29, 2016

@tkelman: ah, right, you'd need to change the name and also clean up RTLD_GLOBAL usage. But those two things together should work, I think...

@njsmith
Copy link

njsmith commented Apr 29, 2016

AFAIK the stuff to make Linux work without RTLD_GLOBAL should be equivalent to the stuff needed to make Windows and osx work at all, since they don't support elf's weird symbol collision semantics in the first place.

@tkelman
Copy link
Member

tkelman commented Apr 29, 2016

I'm still not entirely sure what visibility the automatic dlopen when you use Julia's C FFI uses by default on Linux. It might not be global at all unless you specifically call dlopen asking for it.

I don't know quite the right patchelf invocations to rename all the shared libraries that we ship with Julia and keep them interlinked properly. We already use patchelf at build time for rpath modifications, so I wouldn't be opposed to testing it out. Might not even need a source build of Julia, you could try downloading our binaries and calling patchelf on them directly as a proof of concept?

@jakirkham
Copy link
Member

jakirkham commented Apr 30, 2016

Just as an FYI. We now use devtoolset-2 (gcc 4.8.2) in our image.

I have gone through and audited all existing feedstocks to make sure that they only use the gcc package if they build OpenMP or Fortran support. In the future, we will want to address these as well, but a formal plan has not been made at this time.

In all cases, where the gcc package was used to build C++0x or C++11 (normally on Linux), an issue was raised to note that they should drop that as the default compiler in the Docker container now supports C++11. PRs are being added to address to remove gcc in these cases. This is in progress, but not yet complete.

For all new recipes, please only use the gcc package for OpenMP or Fortran code. All other cases should not need this.

@tkelman
Copy link
Member

tkelman commented Apr 30, 2016

How have you been building this image? Is this (and I know this is a long shot) being built on Docker Hub, Quay, or similar? One challenging aspect here has been having a shared infrastructure to do a build on an image like this. We want to avoid a developer bandwidth problem.

I've now tried Docker Hub, Quay, Travis, Circle CI, and Shippable all building the same GCC source-build Dockerfile. I might be spoiled by having ssh access to a pretty nice server where the image takes about half an hour to build. Everywhere else I've tried takes long enough that it's hitting ~1hr timeouts on Hub and Travis, and still going for multiple hours on the others. Building and pushing locally isn't the end of the world as this shouldn't need updating too often, but it would be nicer if one of the hosted automated services were fast enough to handle this without a much longer turnaround time.

edit: quay did eventually finish, it just took a really long time

@njsmith
Copy link

njsmith commented Apr 30, 2016

I don't know quite the right patchelf invocations to rename all the shared libraries that we ship with Julia and keep them interlinked properly. We already use patchelf at build time for rpath modifications, so I wouldn't be opposed to testing it out.

You can look at the auditwheel source code to see a fully automated script for this, but basically:

  1. download and build an up-to-date git snapshot of patchelf (you need something with this fix and this one, neither of which is released yet)
  2. rename your .so: mv libgfortran.so.3 libgfortran-${UNIQUE}.so.3
  3. tell your .so that it's been renamed: patchelf --set-soname libgfortran-${UNIQUE}.so.3 libgfortran-${UNIQUE}.so.3
  4. find all your executables and shared libraries (basically the same ones that you're currently setting the rpath on), and tell the ones that are currently looking for libgfortran.so.3 that they should look for your renamed version instead: patchelf --replace-needed libgfortran.so.3 libgfortran-${UNIQUE}.so.3 some-file.so

@tkelman
Copy link
Member

tkelman commented Apr 30, 2016

Thanks @njsmith. We're actually getting a little off topic here, maybe we should move this to an issue on JuliaLang/julia or one of the gcc-from-source dockerfile repos? In Julia's case there's a really easy workaround for running old Julia binaries on distros with newer gcc, of deleting the bundled runtime libraries so that the system versions get used instead. I'd need to be convinced renaming is worth it and won't break things, since some packages do need to be able to find Julia's libgfortran or libstdc++ for ffi purposes, linking and loading libraries that don't have rpath set right on their own, etc.

I distrust the devtoolset partial static linking approach since I've seen it not work correctly in complicated examples like openblas and other Julia dependencies. The C++ partial static linking had also caused issues, if I remember correctly. On a normal build of gcc -static-libgfortran rarely works correctly (especially if gcc was built with libquadmath support) and if what you want to build is a shared library, the static copies of libstdc++ and libgfortran have to be carefully built with -fPIC. We couldn't get the devtoolset to do the job for Julia.

@jakirkham
Copy link
Member

In all cases, where the gcc package was used to build C++0x or C++11 (normally on Linux), an issue was raised to note that they should drop that as the default compiler in the Docker container now supports C++11. PRs are being added to address to remove gcc in these cases. This is in progress, but not yet complete.

Just as FYI, this is complete.

@jakirkham
Copy link
Member

As this came up at the compiler meeting the other day, I figured I would share it (also posted on gitter). This is an ancient mailing list thread (had to get from archive) on the conditions under which libstc++ and libc++ can be mixed. Also, there is this info from FreeBSD. Also, an SO answer. The take home message is STL objects cannot be shared between a library built with libstdc++ and a library built with libc++. The only exception to this is exceptions, which can be thrown and caught in libraries of either type.

@tkelman
Copy link
Member

tkelman commented Jun 11, 2016

I wouldn't trust anything written prior to gcc 5 to still be relevant on this subject. ABI tags threw an additional wrench into this issue, and have still not been entirely implemented in LLVM. There are various patches floating around that I think Arch and a few others have been using, but nothing merged and released yet AFAIK.

@jakirkham
Copy link
Member

Sure gcc 5 is different. Unfortunately, when it comes to Mac, we have been using gcc 4.8.5. So, it remains relevant here until we get a newer compiler.

@ocefpaf
Copy link
Member Author

ocefpaf commented Jul 25, 2016

Let's close this and re-discuss once we have a gcc package.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

10 participants