Should we use gcc from the default channel for Linux (and maybe OS X)? #29

ocefpaf · 2016-02-17T20:16:44Z

Most of the time we are OK using the compilers installed in the CIs because we all have similar build tools pre-installed in our machines. However, every now and then someone tries to use the packages in a docker image without those tools. (For example ioos/conda-recipes#723 and ioos/conda-recipes#700.) A few questions:

Will that be fixed using the gcc from the default channel?
What would be the downside of that?
How about OS X? Are we relaying on clang or homebrew gcc? Or it does not matter?

ChrisBarker-NOAA · 2016-02-17T23:32:08Z

I have no idea about most of this, but:

We should build with the same toolchain as Anaconda is built with as much as possible. Which is clang on OS-X, I think.

And the "manylnux" folks have been working on a Docker image for building manylinux wheels, which is derived from the Anaconda experience -- so that might be a good place to go for Linux:

https://github.com/pypa/manylinux

pelson · 2016-02-18T16:10:34Z

I'm pretty happy with the reach of our existing binaries. @ocefpaf - I know there is no time like the present to get this right, but I don't really have any experience of it going wrong. My hunch therefore would be to stick with what we have until we find a problem with it. 👍 / 👎?

PythonCHB · 2016-02-18T16:12:43Z

👍

ocefpaf · 2016-02-18T16:23:49Z

I'm pretty happy with the reach of our existing binaries. @ocefpaf - I
know there is no time like the present to get this right, but I don't
really have any experience of it going wrong.

Well, it is not a matter of right and wrong. I am pretty happy too. The
issue arises when people use the conda package in minimalistic docker
images.

My hunch therefore would be to stick with what we have until we find a
problem with it. / ?

+1 let's just document that people should install build_essentials and
etc.

pelson · 2016-02-18T16:39:59Z

let's just document that people should install build_essentials and

The issue arises when people use the conda package in minimalistic docker images.

Ah OK. I've not seen these. I'm happy to tighten that requirement down somewhat - it sounds like quite a big ask to install build_essentials...

ocefpaf · 2016-02-18T16:54:53Z

My bad, I am on the phone and the # refs above should point to the ioos conda recipe repo.

ocefpaf · 2016-02-18T17:48:34Z

Ah OK. I've not seen these. I'm happy to tighten that requirement down
somewhat - it sounds like quite a big ask to install build_essentials...

build_essentials was a lazy solution from my part. Some cases need only
libgomp, others libgfortran.

jakirkham · 2016-03-16T06:55:28Z

In the few cases where I have had issues elsewhere, I find I can use install_name_tool or patchelf to link things to something like libgcc from conda to resolve these sorts of issue. A little inelegant I suppose, but I do like using the system compilers if I can.

jakirkham · 2016-03-22T20:26:35Z

So, this ( conda-forge/staged-recipes#164 ) might be such a case where we would want to use conda's gcc.

msarahan · 2016-03-22T20:39:38Z

Here's what I understand:

If you ship libgcc (more importantly, libstdc++, which comes with it) and shadow the system libstdc++, and the system libstdc++ is newer than the one you ship, you'll run into unresolved symbol errors at runtime and crash or fail to run.

This has been a huge motivator for me to get GCC 5.2 running in our docker build image.

I have argued very strongly internally against using the gcc that is in defaults. My main argument against even having this package is that people will use it on unknown platforms - and this means their packages will have an unknown version dependency on GLibC.

IMHO, Continuum should just ship all the runtimes, the same way we do with Windows. They are much more nicely backwards/forwards compatible on Linux, but I don't see harm in keeping them controlled on Linux.

jjhelmus · 2016-03-22T20:47:56Z

I think an argument can also be made that you should ship no gcc, libstdc++, and similar runtimes and instead always depend on the system provided ones. This seems to be what the manylinux folks are doing with wheel files. I'm not sure which option is better but I think both should be on the table.

jakirkham · 2016-03-22T20:50:00Z

One of the other ideas, I was playing with in that PR is bundling only a few essential components like libgfortran or libgomp from the VMs we building in. Things that may not already be included on the system, but we are (or will be) linking against. I am just worried they will get crushed when someone installs defaults' libgcc package and am unclear on if (when) that leads to bad behavior. Also, I know a little less about how these fringe components interact with libgcc, which they are all linked to.

jakirkham · 2016-03-22T20:54:17Z

Alternatively static linkage remains a valid option here.

jjhelmus · 2016-03-22T20:57:00Z

I have run into issues where a Fortran compiled extension linked against symbols in my system provided libgfortran that were not in the Anaconda provided one which caused the extension to fail to import. Using conda uninstall -f libgfortran fixed the issue but it is not ideal.

If runtimes are shipped on Linux it seems they must be the most up-to-date versions. Keeping these up to date may require significant maintenance.

jakirkham · 2016-03-22T20:57:55Z

Yeah, I am liking the static option more and more.

msarahan · 2016-03-22T21:01:15Z

I'm not clear on how the Manylinux stuff works to depend on libstdc++ on the system. I'm sure they have something figured out, but I just don't understand it.

This is the article that convinced me to pursue the approach I'm behind: http://www.crankuptheamps.com/blog/posts/2014/03/04/Break-The-Chains-of-Version-Dependency/

Note that this is the same approach taken by the Julia team.

msarahan · 2016-03-22T21:10:29Z

Found it. They place tight restrictions on ABI version:

Therefore, as a consequence of requirement (b), any wheel that depends on versioned symbols from the above shared libraries may depend only on symbols with the following versions:

GLIBC <= 2.5
CXXABI <= 3.4.8
GLIBCXX <= 3.4.9
GCC <= 4.2.0

jjhelmus · 2016-03-22T21:16:40Z

Yup, from my understanding they are defining a base linux system that has a set of "core" libraries which they expect to 1) exist and 2) match a minimum version. But pip does not have a effective method for providing more up-to-date runtimes like conda does.

jjhelmus · 2016-03-22T21:32:44Z

I'm warming more to the idea of providing the latest runtimes. Would this allow us to compile package with the GCC 5 libstdc++ ABI and run them on systems using the GCC 4 API?

msarahan · 2016-03-22T21:38:10Z

I feel like Conda has a better approach here, making the assumption that we should provide it. People can conceivably pip install something without having libstdc++ installed, and end up confused. My wife had that happen with Steam on her Linux computer, for example. Good times. I never thought work would be so useful at home.

FWIW, I'm pretty sure Continuum is taking this route, and you can be certain that it will be maintained as long as we're pushing it, because we'll have customers screaming otherwise.

@jjhelmus yes. Here's my understanding with GCC5:

Compiled with GCC5, CXXFLAGS="${CXXFLAGS} -D_GLIBCXX_USE_CXX11_ABI=0" => GCC4 compatible, runs fine with libstdc++ from gcc 5 (it is dual-abi). Does not link with libs compiled with GCC5 (abi 5)

Compiled with GCC5, CXXFLAGS="${CXXFLAGS} -D_GLIBCXX_USE_CXX11_ABI=1" => GCC5 compatible, runs fine with libstdc++ from gcc 5 (it is dual-abi). Does not link with libs compiled with GCC4.

Continuum is planning on the former setting for now, with a planned switch at some point in the future, along with an associated rebuild of (maybe) everything.

I have tried to make that ABI info readily accessible with startup scripts in the build docker image: https://github.com/ContinuumIO/docker-images/pull/20/files#diff-8320ce46adf2819c0900060bd6c14c43R16

(also see the start_c++??.sh scripts, which are meant to be simple front-ends)

jakirkham · 2016-03-22T21:44:17Z

Continuum is planning on the former setting for now...

Alright, this clarifies the Linux stuff for me.

...with a planned switch at some point in the future, along with an associated rebuild of (maybe) everything.

That's going to be fun. Hopefully, conda-forge has everything and is super fast then. 😄

I have tried to make that ABI info readily accessible with startup scripts in the build docker image...

Thanks. This is really useful.

jjhelmus · 2016-03-22T21:48:25Z

I'm on board too. Thanks for the great explanation @msarahan. It took a bit but I'm seeing the light. Of course now I'm going to have to build GCC 5 tonight.

Sorry for the long tangent on this PR @jakirkham, did this answer your original concern?

jakirkham · 2016-03-22T21:49:57Z

This is what I am still unclear about, are we shipping gcc on Mac too?

msarahan · 2016-03-22T21:57:55Z

My current opinion is yes. I'd like to avoid it if possible, but I see the need for OpenMP and Fortran. I'll keep you all in the loop on any discussions we have here.

jakirkham · 2016-03-22T22:02:49Z

Ok. With OpenMP, maybe we can get around it by doing something similar to the Linux strategy namely building the newest clang on our oldest Mac (10.7). Though Fortran remains a different problem.

msarahan · 2016-03-22T22:07:28Z

Thanks for being receptive, both of you! Now let's go rule the world! (or maybe just build great software)

jakirkham · 2016-03-22T22:22:51Z

Thanks for keeping us in the loop.

Now let's go rule the world! (or maybe just build great software)

Are they mutually exclusive? 😈

jakirkham · 2016-04-29T12:54:34Z

I'm working on a gcc-6-on-centos-6 docker image at the moment, which I'll put up once I can get it into a shape where it can build all of Julia successfully (which depends on openblas).

-- quote from @tkelman

That sounds really cool, @tkelman. Thanks for sharing.

Interesting. Yeah, I think we are staying on CentOS 6 for present, but it is possible if we find it pressing enough that we would go back to CentOS 5. The current thought is that without more pressing reasons (people clamoring for that level of GLIBC compatibility) we will stay on CentOS 6.

There was a docker container that used CentOS 5 and gcc 5.2 that @msarahan had proposed. Though there are some concerns like having to rebuild everything on the old CentOS. Also there is an issue due to CentOS 5 being less than a year from EOL. There were some other concerns about dependencies, which are kind of up in the air (CUDA, cuDNN, etc.).

I know devtoolsets do interesting things when linking libraries so that things remain portable without needing to package libgcc. Though I kind of like this feature. It seems like you were running into issues with it though. Could you please explain? Do you have any thoughts on setting this up in your docker container?

Having a newer gcc sounds nice. However, I'm not sure what breaks there are in gcc 6 and am only aware of the breaks present in gcc 5. Do you know anything about this?

How have you been building this image? Is this (and I know this is a long shot) being built on Docker Hub, Quay, or similar? One challenging aspect here has been having a shared infrastructure to do a build on an image like this. We want to avoid a developer bandwidth problem.

Personally, I would be really interested in being able to share a common framework with Julia (maybe even packages 😉). So would really love to discuss this more with you.

tkelman · 2016-04-29T13:29:39Z

The devtoolset does things in a funny way where it is set up to statically link newer pieces of libstdc++ and libgfortran that might not exist on the default centos system compiler versions. We initially tried to use the devtoolset for Julia, but found when building openblas with the devtoolset the openblas shared library doesn't actually end up statically linked to libgfortran. So there's still a dependency on libgfortran which we have to bundle in our binaries, but we don't want to use the system centos libgfortran version as that's too old. So we transitioned to doing something very similar to what the Conda folks are now doing, building our own GCC 5.x from source on CentOS 5. It was ansible based and hooked up to buildbot and I'm now updating/re-doing that in Docker form with 6.x versions.

GCC 6 does break a fair amount of code. I'm mainly looking at it as slight future-proofing since Arch and unstable versions of Fedora and openSUSE are likely to upgrade to GCC 6 soon. Due to the glibc issue described in detail by @njsmith here https://sourceware.org/bugzilla/show_bug.cgi?id=19884, "generic linux binaries" need to be built on the oldest glibc version of any system a user wants to use (so old centos/rhel drives this), with as new or newer gcc version as any user has installed as their default system compiler version (so arch/fedora/non-LTS-ubuntu drives this).

I'll see whether docker hub's time limit is capable of handling this. I've only used quay a handful of times and haven't hooked it up to github hooks yet (which is really convenient when working with docker hub auto builds) but in manual quay builds it did seem way faster than docker hub.

jakirkham · 2016-04-29T13:44:41Z

...there's still a dependency on libgfortran which we have to bundle in our binaries

Correct. We are aware of this. What do you do to solve this?

Wasn't sure if there were other weird things you noticed.

...but we don't want to use the system centos libgfortran version as that's too old.

If it has all been built with a new gfortran, why does one care about this?

GCC 6 does break a fair amount of code.

Do you have any examples.

the oldest glibc version of any system a user wants to use (so old centos/rhel drives this), with as new or newer gcc version as any user has installed as their default system compiler version

I see so it is just the mad race to stay newer while supplying old GLIBC support. That makes sense.

I'll see whether docker hub's time limit is capable of handling this. I've only used quay a handful of times and haven't hooked it up to github hooks yet (which is really convenient when working with docker hub auto builds) but in manual quay builds it did seem way faster than docker hub.

Would be interesting to see what you discover.

Yeah, I've had so many issues with Docker Hub that I might just want to use quay if for no other reason than it is a little bit more stable.

jakirkham · 2016-04-29T13:46:59Z

Also, there is a similar story with OpenMP as with Fortran when using devtoolsets, if you haven't encountered that yet.

tkelman · 2016-04-29T14:05:24Z

Some of the linking to system libgfortran with devtoolset might be resolvable with openblas makefile patches to remove things like hardcoded -lgfortran, but we didn't look too far into that.

If it has all been built with a new gfortran, why does one care about this?

For the standalone Julia binaries to work on systems that might not have libgfortran installed, we need to bundle a libgfortran. The devtoolset doesn't include its own separate modern shared-library version of libgfortran. But when you build gcc from source in the normal way, you will get a shared libgfortran that you can use just fine. So that's what we do. If any users try building C/C++/Fortran libraries with newer compiler versions than what we used to build Julia, they'll need to delete or rename the runtime libraries that we bundle in the Julia binaries in order to call into them from Julia.

Do you have any examples.

JuliaLang/julia#14829 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69550

Here's my WIP so far: https://github.com/tkelman/c6g6/blob/master/Dockerfile

njsmith · 2016-04-29T19:16:48Z

@tkelman: I'm curious whetherv you've considered handling the system-vs-shipped gfortran issue the same way we are for manylinux builds, by renaming it to avoid triggering that glibc bug.

tkelman · 2016-04-29T19:43:06Z

Considering we've had to deal with blas symbol name collisions for some time even from differently named libraries, I'm not sure changing the library file name alone without also renaming all the symbols would fix matters.

njsmith · 2016-04-29T20:07:56Z

@tkelman: ah, right, you'd need to change the name and also clean up RTLD_GLOBAL usage. But those two things together should work, I think...

njsmith · 2016-04-29T20:10:41Z

AFAIK the stuff to make Linux work without RTLD_GLOBAL should be equivalent to the stuff needed to make Windows and osx work at all, since they don't support elf's weird symbol collision semantics in the first place.

tkelman · 2016-04-29T23:06:54Z

I'm still not entirely sure what visibility the automatic dlopen when you use Julia's C FFI uses by default on Linux. It might not be global at all unless you specifically call dlopen asking for it.

I don't know quite the right patchelf invocations to rename all the shared libraries that we ship with Julia and keep them interlinked properly. We already use patchelf at build time for rpath modifications, so I wouldn't be opposed to testing it out. Might not even need a source build of Julia, you could try downloading our binaries and calling patchelf on them directly as a proof of concept?

jakirkham · 2016-04-30T00:11:23Z

Just as an FYI. We now use devtoolset-2 (gcc 4.8.2) in our image.

I have gone through and audited all existing feedstocks to make sure that they only use the gcc package if they build OpenMP or Fortran support. In the future, we will want to address these as well, but a formal plan has not been made at this time.

In all cases, where the gcc package was used to build C++0x or C++11 (normally on Linux), an issue was raised to note that they should drop that as the default compiler in the Docker container now supports C++11. PRs are being added to address to remove gcc in these cases. This is in progress, but not yet complete.

For all new recipes, please only use the gcc package for OpenMP or Fortran code. All other cases should not need this.

tkelman · 2016-04-30T03:13:17Z

How have you been building this image? Is this (and I know this is a long shot) being built on Docker Hub, Quay, or similar? One challenging aspect here has been having a shared infrastructure to do a build on an image like this. We want to avoid a developer bandwidth problem.

I've now tried Docker Hub, Quay, Travis, Circle CI, and Shippable all building the same GCC source-build Dockerfile. I might be spoiled by having ssh access to a pretty nice server where the image takes about half an hour to build. Everywhere else I've tried takes long enough that it's hitting ~1hr timeouts on Hub and Travis, and still going for multiple hours on the others. Building and pushing locally isn't the end of the world as this shouldn't need updating too often, but it would be nicer if one of the hosted automated services were fast enough to handle this without a much longer turnaround time.

edit: quay did eventually finish, it just took a really long time

njsmith · 2016-04-30T07:58:43Z

I don't know quite the right patchelf invocations to rename all the shared libraries that we ship with Julia and keep them interlinked properly. We already use patchelf at build time for rpath modifications, so I wouldn't be opposed to testing it out.

You can look at the auditwheel source code to see a fully automated script for this, but basically:

download and build an up-to-date git snapshot of patchelf (you need something with this fix and this one, neither of which is released yet)
rename your .so: mv libgfortran.so.3 libgfortran-${UNIQUE}.so.3
tell your .so that it's been renamed: patchelf --set-soname libgfortran-${UNIQUE}.so.3 libgfortran-${UNIQUE}.so.3
find all your executables and shared libraries (basically the same ones that you're currently setting the rpath on), and tell the ones that are currently looking for libgfortran.so.3 that they should look for your renamed version instead: patchelf --replace-needed libgfortran.so.3 libgfortran-${UNIQUE}.so.3 some-file.so

tkelman · 2016-04-30T08:42:31Z

Thanks @njsmith. We're actually getting a little off topic here, maybe we should move this to an issue on JuliaLang/julia or one of the gcc-from-source dockerfile repos? In Julia's case there's a really easy workaround for running old Julia binaries on distros with newer gcc, of deleting the bundled runtime libraries so that the system versions get used instead. I'd need to be convinced renaming is worth it and won't break things, since some packages do need to be able to find Julia's libgfortran or libstdc++ for ffi purposes, linking and loading libraries that don't have rpath set right on their own, etc.

I distrust the devtoolset partial static linking approach since I've seen it not work correctly in complicated examples like openblas and other Julia dependencies. The C++ partial static linking had also caused issues, if I remember correctly. On a normal build of gcc -static-libgfortran rarely works correctly (especially if gcc was built with libquadmath support) and if what you want to build is a shared library, the static copies of libstdc++ and libgfortran have to be carefully built with -fPIC. We couldn't get the devtoolset to do the job for Julia.

jakirkham · 2016-05-01T05:15:04Z

In all cases, where the gcc package was used to build C++0x or C++11 (normally on Linux), an issue was raised to note that they should drop that as the default compiler in the Docker container now supports C++11. PRs are being added to address to remove gcc in these cases. This is in progress, but not yet complete.

Just as FYI, this is complete.

jakirkham · 2016-06-11T03:57:43Z

As this came up at the compiler meeting the other day, I figured I would share it (also posted on gitter). This is an ancient mailing list thread (had to get from archive) on the conditions under which libstc++ and libc++ can be mixed. Also, there is this info from FreeBSD. Also, an SO answer. The take home message is STL objects cannot be shared between a library built with libstdc++ and a library built with libc++. The only exception to this is exceptions, which can be thrown and caught in libraries of either type.

tkelman · 2016-06-11T04:26:08Z

I wouldn't trust anything written prior to gcc 5 to still be relevant on this subject. ABI tags threw an additional wrench into this issue, and have still not been entirely implemented in LLVM. There are various patches floating around that I think Arch and a few others have been using, but nothing merged and released yet AFAIK.

jakirkham · 2016-06-11T04:33:58Z

Sure gcc 5 is different. Unfortunately, when it comes to Mac, we have been using gcc 4.8.5. So, it remains relevant here until we get a newer compiler.

ocefpaf · 2016-07-25T01:32:55Z

Let's close this and re-discuss once we have a gcc package.

ocefpaf mentioned this issue Mar 16, 2016

GDAL dependencies conda-forge/staged-recipes#108

Closed

jakirkham mentioned this issue Mar 22, 2016

gridstuff conda-forge/staged-recipes#164

Merged

jakirkham mentioned this issue Mar 23, 2016

add libgcc for c++/fortran/libgcc/libgomp runtimes conda-forge/staged-recipes#135

Closed

jakirkham mentioned this issue Apr 29, 2016

Try to eliminate binutils on Linux conda-forge/openblas-feedstock#1

Closed

msarahan mentioned this issue May 10, 2016

Add recipe for obspy conda-forge/staged-recipes#582

Merged

jakirkham mentioned this issue Jun 11, 2016

Our numpy uses the default channel gcc via openblas conda-forge/numpy-feedstock#15

Closed

frol mentioned this issue Jun 17, 2016

update megaman to version 0.2 release conda-forge/megaman-feedstock#3

Merged

jakirkham mentioned this issue Jun 27, 2016

build zeromq with older machine? conda-forge/pyzmq-feedstock#7

Closed

ocefpaf closed this as completed Jul 25, 2016

jschueller mentioned this issue Feb 13, 2017

Update to 0.10.4 conda-forge/pyagrum-feedstock#14

Closed

nsoranzo mentioned this issue Feb 22, 2017

Change perl-threaded to perl bioconda/bioconda-recipes#3839

Merged

4 tasks

fredRos mentioned this issue Sep 29, 2017

Add galario to conda mtazzari/galario#93

Closed

yxqd mentioned this issue Sep 11, 2018

Migrate to use anaconda compilers? mantidproject/conda-recipes#26

Closed

juarezr mentioned this issue Feb 9, 2022

Azure pipelines failing with "no hosted parallelism purchased or granted" #1604

Closed

Should we use gcc from the default channel for Linux (and maybe OS X)? #29

Should we use gcc from the default channel for Linux (and maybe OS X)? #29

Comments

ocefpaf commented Feb 17, 2016

ChrisBarker-NOAA commented Feb 17, 2016

pelson commented Feb 18, 2016

PythonCHB commented Feb 18, 2016

ocefpaf commented Feb 18, 2016

pelson commented Feb 18, 2016

ocefpaf commented Feb 18, 2016

ocefpaf commented Feb 18, 2016

jakirkham commented Mar 16, 2016

jakirkham commented Mar 22, 2016

msarahan commented Mar 22, 2016

jjhelmus commented Mar 22, 2016

jakirkham commented Mar 22, 2016

jakirkham commented Mar 22, 2016

jjhelmus commented Mar 22, 2016

jakirkham commented Mar 22, 2016

msarahan commented Mar 22, 2016

msarahan commented Mar 22, 2016

jjhelmus commented Mar 22, 2016

jjhelmus commented Mar 22, 2016

msarahan commented Mar 22, 2016

jakirkham commented Mar 22, 2016

jjhelmus commented Mar 22, 2016

jakirkham commented Mar 22, 2016

msarahan commented Mar 22, 2016

jakirkham commented Mar 22, 2016

msarahan commented Mar 22, 2016

jakirkham commented Mar 22, 2016

jakirkham commented Apr 29, 2016

tkelman commented Apr 29, 2016 • edited Loading

jakirkham commented Apr 29, 2016

jakirkham commented Apr 29, 2016

tkelman commented Apr 29, 2016

njsmith commented Apr 29, 2016

tkelman commented Apr 29, 2016

njsmith commented Apr 29, 2016

njsmith commented Apr 29, 2016

tkelman commented Apr 29, 2016

jakirkham commented Apr 30, 2016 • edited Loading

tkelman commented Apr 30, 2016 • edited Loading

njsmith commented Apr 30, 2016

tkelman commented Apr 30, 2016

jakirkham commented May 1, 2016

jakirkham commented Jun 11, 2016

tkelman commented Jun 11, 2016

jakirkham commented Jun 11, 2016

ocefpaf commented Jul 25, 2016

tkelman commented Apr 29, 2016 •

edited

Loading

jakirkham commented Apr 30, 2016 •

edited

Loading

tkelman commented Apr 30, 2016 •

edited

Loading