Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-3972: [C++] Migrate to LLVM 7. Add option to disable using ld.gold #3499

Closed
wants to merge 14 commits into from

Conversation

wesm
Copy link
Member

@wesm wesm commented Jan 27, 2019

All in all this wasn't too painful, and Gandiva seems to run fine (I wouldn't have expected otherwise).

I tested this locally on Ubuntu 18.10, and found that the LLVM libraries from apt fail to link with ld.gold (binutils 1.16), so I added an option to toggle ld.gold ON and OFF (it's now OFF by default; using ld by default should yield strictly fewer bugs / build failures). Not sure why that is

I will need some help testing and debugging the Crossbow and packaging tasks to make sure this doesn't break anything else before we merge it

@wesm
Copy link
Member Author

wesm commented Jan 27, 2019

@kszucs can you help with running / checking the crossbow tasks?

@xhochy what's involved in upgrading manylinux1 beyond changing the build_llvm.sh script?

@wesm
Copy link
Member Author

wesm commented Jan 27, 2019

@pravindra I wasn't sure whether this would be controversial but we need to move to LLVM 7 to be able to package Gandiva with the Python wheels in the next major release (0.13.0)

@xhochy
Copy link
Member

xhochy commented Jan 27, 2019

@xhochy what's involved in upgrading manylinux1 beyond changing the build_llvm.sh script?

Updating the scripts/build_clang.sh script. Otherwise that should be it.

@xhochy
Copy link
Member

xhochy commented Jan 27, 2019

@wesm ld.gold problems are normally good indicators of real linking problems. What did you face?

@wesm
Copy link
Member Author

wesm commented Jan 27, 2019

I'll try building again later and post the errors here

@pravindra
Copy link
Contributor

I'm pleasantly surprised that there are no gandiva issues :-).

Once the CI build stabilises, I will try your change with the crossbow task used by dremio. Can you please hold off on merging this change till then ?

@wesm
Copy link
Member Author

wesm commented Jan 28, 2019

Yes, of course

@@ -25,7 +25,7 @@ if [ ! -e $CPP_TOOLCHAIN ]; then
CONDA_PACKAGES=""

if [ $ARROW_TRAVIS_GANDIVA == "1" ] && [ $TRAVIS_OS_NAME == "osx" ]; then
CONDA_PACKAGES="$CONDA_PACKAGES llvmdev=6.0.1"
CONDA_PACKAGES="$CONDA_PACKAGES llvmdev=$CONDA_LLVM_VERSION"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can use the syntax llvmdev="7.0.*" instead of hardcoding the bugfix level.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@wesm
Copy link
Member Author

wesm commented Jan 30, 2019

Here's what the ld.gold errors look like:

https://gist.github.com/wesm/c11c4ebd90be74f9e536f86d11f540c5

This was on Ubuntu Cosmic / 18.10

I'm going to try this on Ubuntu 14.04 also to see if the issue is somehow related to LLVM or whether it is particular to Ubuntu 18.10

@wesm
Copy link
Member Author

wesm commented Jan 30, 2019

The build worked on Ubuntu 14.04 using both ld and gold. Not sure where to go for here, this is not my expertise

I think the prudent thing is to disable ld.gold by default -- if there is a build configuration where it works well and reduces build times, then we can definitely turn it on

@wesm
Copy link
Member Author

wesm commented Feb 1, 2019

@xhochy do you want to take care of the LLVM 7 upgrade in this patch or in a different one?

Anything else you all want to take care of in this patch?

@xhochy
Copy link
Member

xhochy commented Feb 1, 2019

@xhochy do you want to take care of the LLVM 7 upgrade in this patch or in a different one?

Anything else you all want to take care of in this patch?

You mean in the manylinux1 container? Or in total? I would only do the former.

@pravindra
Copy link
Contributor

@wesm I tried the crossbow builds used by dremio with your patch. I found two issues

  1. conda requires full version (llvmdev-7.0), whereas apt-get requires only major number (clang-7)
  2. on mac, the script couldn't find clang-7.
    • I couldn't find clang-7 on conda. so, for now, i just modified the script to switch to default clang if clang-7 is not found.

My changes are here : pravindra@3da660b

@xhochy
Copy link
Member

xhochy commented Feb 4, 2019

@pravindra The clang package called clangdev on conda.

@wesm wesm changed the title WIP ARROW-3972: [C++] Migrate to LLVM 7. Add option to disable using ld.gold ARROW-3972: [C++] Migrate to LLVM 7. Add option to disable using ld.gold Feb 4, 2019
@wesm
Copy link
Member Author

wesm commented Feb 4, 2019

thanks @pravindra! I rebased and cherry picked your changes. I'll merge this once the build is once the build is passing

@wesm
Copy link
Member Author

wesm commented Feb 5, 2019

@pravindra @kou @xhochy I need your assistance to complete this work

I tried upgrading LLVM to 7 in manylinux1 build but

FAILED: lib/Basic/CMakeFiles/clangBasic.dir/Targets/AMDGPU.cpp.o 
/opt/rh/devtoolset-2/root/usr/bin/c++  -DGTEST_HAS_RTTI=0 -D_DEBUG -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -Ilib/Basic -I../lib/Basic -I../include -Iinclude -I//include -fPIC -fvisibility-inlines-hidden -std=c++11 -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wno-missing-field-initializers -Wdelete-non-virtual-dtor -Wno-comment -ffunction-sections -fdata-sections -fno-common -Woverloaded-virtual -fno-strict-aliasing -pedantic -Wno-long-long -O3    -UNDEBUG  -fno-exceptions -fno-rtti -MD -MT lib/Basic/CMakeFiles/clangBasic.dir/Targets/AMDGPU.cpp.o -MF lib/Basic/CMakeFiles/clangBasic.dir/Targets/AMDGPU.cpp.o.d -o lib/Basic/CMakeFiles/clangBasic.dir/Targets/AMDGPU.cpp.o -c ../lib/Basic/Targets/AMDGPU.cpp
../lib/Basic/Targets/AMDGPU.cpp: In constructor ‘clang::targets::AMDGPUTargetInfo::AMDGPUTargetInfo(const llvm::Triple&, const clang::TargetOptions&)’:
../lib/Basic/Targets/AMDGPU.cpp:333:49: error: ‘OpenCL’ is not a member of ‘llvm::Triple’
                      Triple.getEnvironment() == llvm::Triple::OpenCL ||
                                                 ^
...
ninja: build stopped: subcommand failed.
The command '/bin/sh -c /build_clang.sh' returned a non-zero code: 1

Some initial Googling didn't produce an obvious answer

I also have a build failure in normal Xenial with

: && /home/travis/build/apache/arrow/cpp-toolchain/bin/ccache /usr/lib/ccache/g++  -ggdb -O0  -Wall -Wconversion -Wno-sign-conversion -Werror -msse4.2  -B/home/travis/build/apache/arrow/cpp-toolchain/bin -g  -rdynamic src/gandiva/CMakeFiles/gandiva-decimal_type_util_test.dir/decimal_type_util_test.cc.o  -o debug/gandiva-decimal_type_util_test  -Wl,-rpath,/home/travis/build/apache/arrow/cpp-build/debug:/home/travis/build/apache/arrow/cpp-toolchain/lib debug/libgandiva.so.13.0.0 /home/travis/build/apache/arrow/cpp-toolchain/lib/libre2.a debug/libarrow_testing.so.13.0.0 debug/libarrow.so.13.0.0 -ldl /home/travis/build/apache/arrow/cpp-toolchain/lib/libdouble-conversion.a /home/travis/build/apache/arrow/cpp-toolchain/lib/libboost_system.so /home/travis/build/apache/arrow/cpp-toolchain/lib/libboost_filesystem.so /home/travis/build/apache/arrow/cpp-toolchain/lib/libboost_regex.so /home/travis/build/apache/arrow/cpp-toolchain/lib/libgtest.a /home/travis/build/apache/arrow/cpp-toolchain/lib/libgmock_main.a /home/travis/build/apache/arrow/cpp-toolchain/lib/libgmock.a -ldl jemalloc_ep-prefix/src/jemalloc_ep/dist//lib/libjemalloc_pic.a -pthread -lrt -Wl,-rpath-link,/home/travis/build/apache/arrow/cpp-toolchain/lib && :
/home/travis/build/apache/arrow/cpp-toolchain/bin/ld: debug/libgandiva.so.13.0.0: undefined reference to `typeinfo for llvm::ErrorInfoBase'

I added the missing link library libLLVMSupport.a and it didn't help. I'm going to try the "all" option

@pravindra
Copy link
Contributor

@wesm, I can work on fixing the xenial build.

@wesm
Copy link
Member Author

wesm commented Feb 5, 2019

Linking to all the static libraries didn't make that error go away...

@wesm
Copy link
Member Author

wesm commented Feb 5, 2019

Thanks

@wesm wesm mentioned this pull request Feb 5, 2019
@xhochy
Copy link
Member

xhochy commented Feb 5, 2019

@pravindra @kou @xhochy I need your assistance to complete this work

I tried upgrading LLVM to 7 in manylinux1 build but

FAILED: lib/Basic/CMakeFiles/clangBasic.dir/Targets/AMDGPU.cpp.o 
/opt/rh/devtoolset-2/root/usr/bin/c++  -DGTEST_HAS_RTTI=0 -D_DEBUG -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -Ilib/Basic -I../lib/Basic -I../include -Iinclude -I//include -fPIC -fvisibility-inlines-hidden -std=c++11 -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wno-missing-field-initializers -Wdelete-non-virtual-dtor -Wno-comment -ffunction-sections -fdata-sections -fno-common -Woverloaded-virtual -fno-strict-aliasing -pedantic -Wno-long-long -O3    -UNDEBUG  -fno-exceptions -fno-rtti -MD -MT lib/Basic/CMakeFiles/clangBasic.dir/Targets/AMDGPU.cpp.o -MF lib/Basic/CMakeFiles/clangBasic.dir/Targets/AMDGPU.cpp.o.d -o lib/Basic/CMakeFiles/clangBasic.dir/Targets/AMDGPU.cpp.o -c ../lib/Basic/Targets/AMDGPU.cpp
../lib/Basic/Targets/AMDGPU.cpp: In constructor ‘clang::targets::AMDGPUTargetInfo::AMDGPUTargetInfo(const llvm::Triple&, const clang::TargetOptions&)’:
../lib/Basic/Targets/AMDGPU.cpp:333:49: error: ‘OpenCL’ is not a member of ‘llvm::Triple’
                      Triple.getEnvironment() == llvm::Triple::OpenCL ||
                                                 ^
...
ninja: build stopped: subcommand failed.
The command '/bin/sh -c /build_clang.sh' returned a non-zero code: 1

Some initial Googling didn't produce an obvious answer

I also have a build failure in normal Xenial with

: && /home/travis/build/apache/arrow/cpp-toolchain/bin/ccache /usr/lib/ccache/g++  -ggdb -O0  -Wall -Wconversion -Wno-sign-conversion -Werror -msse4.2  -B/home/travis/build/apache/arrow/cpp-toolchain/bin -g  -rdynamic src/gandiva/CMakeFiles/gandiva-decimal_type_util_test.dir/decimal_type_util_test.cc.o  -o debug/gandiva-decimal_type_util_test  -Wl,-rpath,/home/travis/build/apache/arrow/cpp-build/debug:/home/travis/build/apache/arrow/cpp-toolchain/lib debug/libgandiva.so.13.0.0 /home/travis/build/apache/arrow/cpp-toolchain/lib/libre2.a debug/libarrow_testing.so.13.0.0 debug/libarrow.so.13.0.0 -ldl /home/travis/build/apache/arrow/cpp-toolchain/lib/libdouble-conversion.a /home/travis/build/apache/arrow/cpp-toolchain/lib/libboost_system.so /home/travis/build/apache/arrow/cpp-toolchain/lib/libboost_filesystem.so /home/travis/build/apache/arrow/cpp-toolchain/lib/libboost_regex.so /home/travis/build/apache/arrow/cpp-toolchain/lib/libgtest.a /home/travis/build/apache/arrow/cpp-toolchain/lib/libgmock_main.a /home/travis/build/apache/arrow/cpp-toolchain/lib/libgmock.a -ldl jemalloc_ep-prefix/src/jemalloc_ep/dist//lib/libjemalloc_pic.a -pthread -lrt -Wl,-rpath-link,/home/travis/build/apache/arrow/cpp-toolchain/lib && :
/home/travis/build/apache/arrow/cpp-toolchain/bin/ld: debug/libgandiva.so.13.0.0: undefined reference to `typeinfo for llvm::ErrorInfoBase'

I added the missing link library libLLVMSupport.a and it didn't help. I'm going to try the "all" option

The error here is not in LLVM but in clang (separate build step). My PR to your repo fixes that.

@pravindra
Copy link
Contributor

pravindra commented Feb 6, 2019

@wesm - the xenial error was happening because it was picking up clang/llvm from travis (pre-installed) rather than the one from apt-get. Using the one from apt-get fixed the issue. I also reverted the change to link all llvm libs (i.e we are back to linking only the required llvm libs).

wesm#7

@wesm
Copy link
Member Author

wesm commented Feb 6, 2019

That's great, thank you.

wesm and others added 8 commits February 6, 2019 10:36
Change-Id: Ieaf5302599a83f35ceaf3c4c8ffbe63c03619d98
Change-Id: If6b93a6ef467a9afb62fe91783b0bff8a1e78438
Change-Id: I73985c52be2837496590443f7c990466712ec037
Change-Id: I7e6d154d21b673e6c070bed8c046ca4dab0559f2
Change-Id: I6d13457a689406ed9a98840463a157bc2be7430d
Change-Id: I07da2e2e663150cbc9c4fa51a48e4e0aa4bd93cc
Change-Id: I54d32ad72a87713ce82e17769a589dde00965102
@wesm
Copy link
Member Author

wesm commented Feb 6, 2019

Travis CI looking OK now. I just rebased to make sure we get a clean build. I'll then merge this and notify the mailing list about upgrading clang toolchains

This is good also because Visual Studio 2017 requires at least clang 7 -- I hit that when doing Windows testing

@wesm
Copy link
Member Author

wesm commented Feb 6, 2019

@wesm wesm closed this in c81fbaa Feb 6, 2019
@wesm wesm deleted the llvm-7 branch February 6, 2019 21:53
xhochy pushed a commit that referenced this pull request Feb 8, 2019
All in all this wasn't too painful, and Gandiva seems to run fine (I wouldn't have expected otherwise).

I tested this locally on Ubuntu 18.10, and found that the LLVM libraries from apt fail to link with ld.gold (binutils 1.16), so I added an option to toggle ld.gold ON and OFF (it's now OFF by default; using `ld` by default should yield strictly fewer bugs / build failures). Not sure why that is

I will need some help testing and debugging the Crossbow and packaging tasks to make sure this doesn't break anything else before we merge it

Author: Wes McKinney <[email protected]>
Author: Pindikura Ravindra <[email protected]>
Author: Korn, Uwe <[email protected]>
Author: Uwe L. Korn <[email protected]>

Closes #3499 from wesm/llvm-7 and squashes the following commits:

5c48eed <Korn, Uwe> Switch to docker image on branch
93aadd7 <Uwe L. Korn> Update clang
e155fad <Pindikura Ravindra> Use the llvm from apt-get (instead of travis)
987fc5f <Wes McKinney> Enable all LLVM libraries
ea1f013 <Wes McKinney> Link libLLVMSupport.a later
034b17d <Wes McKinney> Add missing libLLVMSupport.a dependency
4448243 <Wes McKinney> Install clang before calling travis_install_linux.sh
59d170a <Wes McKinney> Fix more usages of LLVM version
ce831e2 <Wes McKinney> Fix clang executable name in .travis.yml
69fd486 <Pindikura Ravindra> ARROW-3972: misc fixes for LLVM7
c858441 <Wes McKinney> Some CI fixes
87434da <Wes McKinney> Code comments
d198eb8 <Wes McKinney> Decruft. Turn off ld.gold by default
2fc3a26 <Wes McKinney> Build project with LLVM 7. Add option to disable using ld.gold
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants