-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Contrib doesnt build on Linux gcc (ICU) #31807
Comments
related/testing PR #31806 |
I think this is fixed by #31814 |
unfortunately not - its still broken on ICU build |
This comment was marked as off-topic.
This comment was marked as off-topic.
@lukidzi to clarify this ticket has nothing to do with macos or CEL there are various other reports elsewhere - and as i have suggested there you should open a ticket relevant to your issue i dont use or have any access to macos so its not something im likely to follow up on |
seems its failing on all release branches - altho not sure if different error - this is 1.26 gcc ... /b/f/w/external/com_github_unicode_org_icu/icu4c/source/tools/makeconv/ucnvstat.c
/usr/bin/gcc -U_FORTIFY_SOURCE -fstack-protector -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -std=c++0x -fno-canonical-system-headers -Wno-builtin-macro-redefined -D__DATE__=redacted -D__TIMESTAMP__=redacted -D__TIME__=redacted -DABSL_MIN_LOG_LEVEL=4 -fPIC -Wno-deprecated-declarations -std=c++17 -fPIC -DU_CHARSET_IS_UTF8=1 -DU_USING_ICU_NAMESPACE=0 -DUCONFIG_ONLY_HTML_CONVERSION=1 -DUCONFIG_NO_LEGACY_CONVERSION=1 -DUCONFIG_NO_BREAK_ITERATION=1 -DUCONFIG_NO_COLLATION=1 -DUCONFIG_NO_FORMATTING=1 -DUCONFIG_NO_TRANSLITERATION=1 -DUCONFIG_NO_REGULAR_EXPRESSIONS=1 -W -Wall -pedantic -Wpointer-arith -Wwrite-strings -Wno-long-long -fuse-ld=lld -Wl,-no-as-needed -Wl,-z,relro,-z,now -B/usr/bin -pass-exit-codes -lm -fuse-ld=gold -l:libstdc++.a -Wl,--gc-sections -o ../../bin/makeconv gencnvex.o genmbcs.o makeconv.o ucnvstat.o -L../../lib -licutu -L../../lib -licui18n -L../../lib -licuuc -L../../stubdata -licudata -lpthread -lm
makeconv.o:makeconv.cpp:DW.ref.__gxx_personality_v0: error: undefined reference to '__gxx_personality_v0'
../../lib/libicuuc.a(umutex.ao):umutex.cpp:function std::mutex::lock(): error: undefined reference to 'std::__throw_system_error(int)'
../../lib/libicuuc.a(umutex.ao):umutex.cpp:function umtx_cleanup: error: undefined reference to 'std::condition_variable::~condition_variable()'
../../lib/libicuuc.a(umutex.ao):umutex.cpp:function umtx_init::{lambda()#2}::operator()() const: error: undefined reference to 'std::condition_variable::condition_variable()'
../../lib/libicuuc.a(umutex.ao):umutex.cpp:function icu_72::umtx_initImplPreInit(icu_72::UInitOnce&): error: undefined reference to 'std::condition_variable::wait(std::unique_lock<std::mutex>&)'
../../lib/libicuuc.a(umutex.ao):umutex.cpp:function icu_72::umtx_initImplPostInit(icu_72::UInitOnce&): error: undefined reference to 'std::condition_variable::notify_all()'
../../lib/libicuuc.a(umutex.ao):umutex.cpp:function std::call_once<void (&)()>(std::once_flag&, void (&)())::{lambda()#2}::operator()() const: error: undefined reference to 'std::__once_callable'
../../lib/libicuuc.a(umutex.ao):umutex.cpp:function void std::call_once<void (&)()>(std::once_flag&, void (&)()): error: undefined reference to 'std::__once_callable'
../../lib/libicuuc.a(umutex.ao):umutex.cpp:function void std::call_once<void (&)()>(std::once_flag&, void (&)()): error: undefined reference to 'std::__once_call'
../../lib/libicuuc.a(umutex.ao):umutex.cpp:function void std::call_once<void (&)()>(std::once_flag&, void (&)()): error: undefined reference to '__once_proxy'
../../lib/libicuuc.a(umutex.ao):umutex.cpp:function void std::call_once<void (&)()>(std::once_flag&, void (&)()): error: undefined reference to 'std::__throw_system_error(int)'
../../lib/libicuuc.a(umutex.ao):umutex.cpp:function std::unique_lock<std::mutex>::lock(): error: undefined reference to 'std::__throw_system_error(int)'
../../lib/libicuuc.a(umutex.ao):umutex.cpp:function std::unique_lock<std::mutex>::lock(): error: undefined reference to 'std::__throw_system_error(int)'
collect2: error: ld returned 1 exit status
make[2]: *** [Makefile:81: ../../bin/makeconv] Error 1
make[2]: Leaving directory '/b/f/w/bazel-out/k8-fastbuild/bin/bazel/foreign_cc/unicode_icu_build.build_tmpdir/tools/makeconv'
make[1]: *** [Makefile:47: all-recursive] Error 2
make[1]: Leaving directory '/b/f/w/bazel-out/k8-fastbuild/bin/bazel/foreign_cc/unicode_icu_build.build_tmpdir/tools'
make: *** [Makefile:153: all-recursive] Error 2
make: Leaving directory '/b/f/w/bazel-out/k8-fastbuild/bin/bazel/foreign_cc/unicode_icu_build.build_tmpdir'
_____ END BUILD LOGS _____
rules_foreign_cc: Build wrapper script location: bazel-out/k8-fastbuild/bin/bazel/foreign_cc/unicode_icu_build_foreign_cc/wrapper_build_script.sh
rules_foreign_cc: Build script location: bazel-out/k8-fastbuild/bin/bazel/foreign_cc/unicode_icu_build_foreign_cc/build_script.sh
rules_foreign_cc: Build log location: bazel-out/k8-fastbuild/bin/bazel/foreign_cc/unicode_icu_build_foreign_cc/Configure.log |
This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions. |
This issue has been automatically closed because it has not had activity in the last 37 days. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted" or "no stalebot". Thank you for your contributions. |
@phlax I think I know what the issue here is. I was able to reproduce the issue with --config=docker-gcc and was able to move past this error (though to dicover another issue) when I added At least --config=docker-gcc is fixable by changing .bazelrc, I'm not entirely sure how to fix it for builds witout explicit config provided (e.g. when toolchain is automatically detected and only g++/gcc is available), but I may try to look into it, if you're ok with assigning the issue to me? For the context, I waas made aware of this issue when some folks internally in Microsoft hit it, so that's why I'm looking at this. |
/assign @krinkinmu |
I think I was correct that /usr/bin/gcc is ultimately used as CXX in the configure script, but I was not correct that providing host_action_env fixes the issue. It was a silly mistake on my part - the build order in Bazel is not deterministic, so when I discovered a new issue with GCC build I incorrectly assumed that the previous issue is addressed, alas it wasn't. That being said, after digging through the Bazel support for CC tolchains and how can we separate CC and CXX, I realized that the failure in ICU build happens when we build tools (e.g. binaries that come together with ICU distribution). I don't think that we actually need those binary tools - we only need a libraries (and those tools aren't used in the build itself). In the end, I could fix the issue by just not building ICU tools. After fixing ICU build, I hit another minor issue in the postgres plugin - easy to fix. And the last issue that I hit was with the cel library is overloaded-virtual warning (the same as #31814, but it was re-introduced after the fix) - I now testing a fix for that issue. If cel fix works, I will try to create a PR for the cel repo and once it's merged, I prepare a PR for Envoy fix (which will involve bumping cel library version, to avoid mainting a patch on the Envoy side). |
Created google/cel-cpp#1048 in cel-cpp, to fix that part of the gcc build. |
Another small update. I was not able to figure out the linker issue and building even on a very large machine results in OOM when using gcc (I tried versions 11 and 12). On the bright side, I now know that the issue is triggered by a single contrib extension - VPP VCL. When I disable VPP VCL extension, everything builds fine (modulo gcc converting some warnings to errors here and there). I don't know what exactly in the VPP VCL extension trips gcc yet, but regardless of that, I would think that the gcc behavior in this case is more likely to be a gcc bug rather than VPP VCL bug. At the moment I'm looking for potential build configuration option changes in the extension to see if we can help gcc to build it without issues as a potential work around and will see if I can come up with a reproducer and send a bug report to the gcc folks. |
cc @florincoras |
@krinkinmu do you happen to have some logs? Also what gcc version are you using? VCL (part of fd.io VPP) currently runs a CI job that builds VPP with gcc 9.4 |
@florincoras sorry, I don't have any meangiful logs since the failure I've seen is that linker runs for a long time continiously accumulating residential memory and eventually gets killed by OOM reaper. So it's not like compilation or linking failed with an error, the problem is that linking is so resource hungry that even 256GiB of RAM is not enough. As for the compiler versions, I tried two:
Both experience the same failure mode. The version 11 is what the docker-gcc is currently using, so that's why I was experiemnting with it. And I tried version 12 just to see if maybe it's a known bug that got fixed in a later release of gcc and didn't get backported. I would like to also point out that it's an LTO build. I'm not entirely sure where the LTO is configured (I don't seem to see any explicit options enabling LTO), but the OOM happens during one of the LTO passes in the linker. And building without LTO is broken, because even when I give bazel options to disable LTO, for unclear reason, some of the dependencies., including VPP VCL, are still built with LTO enabled. And just to avoid some silly issues withing my environment, I cleaned up bazel cache (with --expunge) and tried from scratch multiple times with no difference. |
No worries, at least that gave me something to look at. My envoy build machine doesn't have gcc-11 but I just ran an independent vpp build with gcc-11 and everything seems to be working fine. Regarding LTO, it is applied to only one component, i.e., vppinfra, that vcl depends on. Could you add a patch at the end of vpp_vcl.patch wherein you comment out this line? |
@florincoras Your suggestion helped to move past this linking issue. I now getting other errors that, I think, indicate a different issue (see below). These errors suggest that some symbols missing in the VPP extension, but I saw similar issues with other Envoy modules when compiled with gcc. In other cases they manifested when I used
|
I would like to change the mode of operation on this bug a little bit. Given that there were a bunch of issues with gcc build that are unrelated to each other, I would like to prepare a few small PRs for issues that I know how to fix and then return to the linking issues with optimized gcc builds to close this bug. |
@krinkinmu sounds like a great idea |
Regarding the missing symbols, if you want to try out debug vcl binaries, try changing this to Debug |
Clang and gcc are subtly different and it seems to be the cause of contrib build failures reported in envoyproxy#31807 (e.g., when using gcc to link the final binary it results in a bunch of essential for gcc C++ symbols like __gxx_personality_v0). This issue could be addressed (i.e., Envoy can be built using gcc). However, given that it only happens when we build ICU binary tools and those tools aren't used by Envoy directly or the build system, we can just not build them and address the issue this way. Signed-off-by: Mikhail Krinkin <[email protected]>
Clang and gcc are subtly different and it seems to be the cause of contrib build failures reported in envoyproxy#31807 (e.g., when using gcc to link the final binary it results in a bunch of essential for gcc C++ symbols like __gxx_personality_v0). The issue appear to be the order of the libraries when linking. gcc, when building statically linked binaries basically needs libstdc++ to be the last library or alsmot the last library in the command line. And clang does not appear to care about it much. This change provides libstdc++ library in LIBS environment variabe which will put it in the right position when building the ICU library. This works well for both clang and gcc. Signed-off-by: Mikhail Krinkin <[email protected]>
…when linking ICU tools Clang and gcc are subtly different and it seems to be the cause of contrib build failures reported in envoyproxy#31807 (e.g., when using gcc to link the final binary it results in a bunch of essential for gcc C++ symbols like __gxx_personality_v0). The issue appear to be the order of the libraries when linking. gcc, when building statically linked binaries basically needs libstdc++ to be the last library or alsmot the last library in the command line. And clang does not appear to care about it much. This change provides libstdc++ library in LIBS environment variabe which will put it in the right position when building the ICU library. This works well for both clang and gcc. Signed-off-by: Mikhail Krinkin <[email protected]>
…when linking ICU tools (#37060) Commit Message: Clang and gcc are subtly different and it seems to be the cause of contrib build failures reported in #31807 (e.g., when using gcc to link the final binary it results in a bunch of essential for gcc C++ symbols like __gxx_personality_v0). The issue appear to be the order of the libraries when linking. gcc, when building statically linked binaries basically needs libstdc++ to be the last library or alsmot the last library in the command line. And clang does not appear to care about it much. This change provides libstdc++ library in LIBS environment variabe which will put it in the right position when building the ICU library. This works well for both clang and gcc. Additional Description: It address the issue reported in #31807, though by itseld this change is not enough to make gcc builds work - a few more changes are needed. Risk Level: Low Testing: built with `--config=gcc` and `--config=docker-gcc` and checked that //contrib/language/filters/http/test:language_config_test pass after the change. Docs Changes: N/A Release Notes: N/A Platform Specific Features: N/A Signed-off-by: Mikhail Krinkin <[email protected]>
…37038) Angle brackets are not required after constructor and, maybe, aren't even correct, though I'm not 100% sure on what the standard says on the matter. It seems like clang is fine with this syntax, but when you try to build Envoy with gcc it complains: ``` ./contrib/postgres_proxy/filters/network/source/postgres_message.h: At global scope: ./contrib/postgres_proxy/filters/network/source/postgres_message.h:397:14: error: expected unqualified-id before ')' token 397 | Sequence<>() = default; | ^ Target //contrib/exe:envoy-static failed to build ``` Given that it's at least unusual to have angle brackets after constructor in a class template specialization let's remove them and satisfy both gcc and clang. It's one of the issue that prevent contrib build with gcc. It's not the original issue reported in #31807, but that issue is what started the investigation. Signed-off-by: Mikhail Krinkin <[email protected]>
Commit Message: There were a few issues with building Envoy with VCL. The fist issue is that vppinfra library is built with LTO enabled. While there is nothing wrong with enabling LTO, it apparently triggers some bug in GCC - during linking one of the LTO passes just consumes all the memory in the system and eventually crashes without finishing ( I tried to build Envoy on a system with 256GiB of memory and it wasn't enough, so it's way past what is reasonable). To workaround the issue I updated vpp_vcl.patch to conditionally disable LTO when building using GCC. Once LTO was disabled I hit another issue - the order of libraries in linker command line does matter, at least in the world of Unix-like systems. Normally, Bazel can figure out the right order, but with VPP static libraries that are built by CMake Bazel has no information to figure out what is the proper order of those libraries. And that ultimately resulted in linking failures. I considered a few options to address the issue: 1. Use alwayslink = True - while it should be the simplest and the least surprising solution to the problem, apparently, alwayslink does not do anything for static libraries, so this option does not work 2. Maintain the right order of libraries in the BUILD file - that works, but it's unusual when order of targets in Bazel srcs and deps matters, so to avoid surprising behaviour I didn't go for that option 3. Use genrule and combine different static libraries into a single static library - it should work in theory, but I couldn't refer to the `ar` tool from genrule and abandoned this option 4. Use --start-group and --end-group linker options to tell to the linker that all VPP static libraries should be considered together as a single unit - this is the option I implemented in the end. Additional Description: This is part of the work done to fix gcc builds of Envoy tracked in #31807. This change by itself does not address the issue completely yet, but it moves us a bit closer. Signed-off-by: Mikhail Krinkin <[email protected]>
The summary of the current state:
Warnings don't seem like a big deal - we can always make those non-fatal at least. The linking issue is more intersting and I will try to resolve that one first. |
I took one of the functions referenced in the linker error and tried to see where it's coming from and where it should defined to get some anchors in the code to start from:
I could actually find a reference to this function in the code: https://github.com/v8/v8/blob/bb6560830541d9d937e0ae2e2164d8708362fcae/src/codegen/shared-ia32-x64/macro-assembler-shared-ia32-x64.cc#L473. What I could not find though is where the function is implemented. I know that the version of instruction with 3 registers exists in Intel specs and I could find in V8 code the version of this instruction with 2 registers and an immediate. However, I could not find any place that defines a version with 3 registers. It's not super easy to search through V8 code generation code because it's somewhat macro heavy, so to double check I did the following:
that also didn't yield the right symbol in any of the object files that were referenced in the linker command (it did find a version of vpsllw with 2 registers and an immediate, but not the version with 3 registers - the same thing I had after looking through the V8 code). So unless I made a mistake somewhere, the linking error might actually be legit in the sense that this function is referenced and does not exist. Because the issue only manifests when building with
I'm going to repeat the same thing I did above for fastbuild to check if somehow maybe options 1 or 2 are true with |
In unoptimized build there is indeed no reference to If it was just a single undefined reference in v8 I would consider it a typo or a bug that just went unnoticed for a while. However, the number of undefined references coming from the v8 code is rather significant, so it's hard to believe that it's just a one off typo or mistake. |
Oh, I just spotted a thing that I think is important. Just like error in #31807 (comment) all other errors are also referring to the This is important because this section does not contain an actually used code. |
Compiling v8 without debugging information with The problem also looks very similar to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81993 that was reported and fixed in gcc quite a while ago (long before the gcc 11 or gcc 12, that I'm using now, were released). And after some more searching I found https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110885. And from that report it looks like the following things are of interest:
For now I'm leaning towards just finding a way to disable |
hmm - yeah, i was wondering similar after reading the tickets you posted
yeah - i think that is probably ok - we only publish the clang-built debug binary - so it wouldnt affect us directly, and would be better than the current situation |
When attempting to build contrib with gcc it throws ICU related errors:
The text was updated successfully, but these errors were encountered: