Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gcc ABI incompatibility when passing llvm::Optional #38775

Closed
llvmbot opened this issue Oct 25, 2018 · 20 comments
Closed

gcc ABI incompatibility when passing llvm::Optional #38775

llvmbot opened this issue Oct 25, 2018 · 20 comments
Labels
bugzilla Issues migrated from bugzilla

Comments

@llvmbot
Copy link
Member

llvmbot commented Oct 25, 2018

Bugzilla Link 39427
Resolution FIXED
Resolved on Mar 07, 2019 00:13
Version 7.0
OS Linux
Blocks #38454
Attachments ABI demonstration
Reporter LLVM Bugzilla Contributor
CC @aleden,@d0k,@dwblaikie,@zmodem,@cuviper,@slacka,@zygoloid,@rnk,@serge-sans-paille,@sylvestre,@TNorthover,@tstellar
Fixed by commit(s) r346985

Extended Description

Calling llvm::ConstantExpr::getGetElementPtr crashes when LLVM 7 is compiled with gcc 8.2 and the caller is compiled with clang 7.

The reason is an ABI incompatibility: clang wants to pass the llvm::Optional parameter by value (i.e., as a single i64), while gcc expects the value to be passed by reference.

I have attached a simple test program that demonstrates the problem explicitly, regardless how the underlying LLVM was compiled. (It still uses the LLVM 7 headers, but the problem is triggered purely in the example code here).

To reproduce, compile and run the attached program like this:

clang++-7 -c -opart1.o foo.cpp llvm-config-7 --cxxflags
g++ -c -opart2.cpp -DPART2 foo.cpp llvm-config-7 --cxxflags
clang++-7 -ofoo part1.o part2.o llvm-config-7 --cxxflags
./foo

This leads to a crash. Using either g++ or clang++ for both files is fine, but mixing them leads to a crash. By inspecting the assembler code we can see that clang++ passes by value, and g++ passes by reference.

@rnk
Copy link
Collaborator

rnk commented Oct 25, 2018

Yeah, this is bad. It's been discussed on llvm-dev and the cause is known, but the ship has sailed:
http://lists.llvm.org/pipermail/llvm-dev/2018-September/126564.html
http://llvm.org/viewvc/llvm-project?view=revision&revision=342637

Maybe we can fix it for 7.0.1.

Surprisingly, I don't think we had a bug for it in the tracker until now, just mailing list discussion.

@tstellar
Copy link
Collaborator

Is it correct that the consequences of backporting r342637 are:

  • 7.0.0->7.0.1 ABI break for libLLVM-7.so that was compiled by gcc.
  • gcc < 7.3 won't be able to compile LLVM 7.0.1.

@zmodem
Copy link
Collaborator

zmodem commented Nov 7, 2018

Is it correct that the consequences of backporting r342637 are:

  • 7.0.0->7.0.1 ABI break for libLLVM-7.so that was compiled by gcc.
  • gcc < 7.3 won't be able to compile LLVM 7.0.1.

No, r342637 just reverted trunk back to the state we were in when branching for 7.0.0 IIRC.

This isn't fixed on trunk yet.

My proposal is to go back to what we had before r322838.

@cuviper
Copy link
Member

cuviper commented Nov 13, 2018

I believe they're hitting this in the Debian bug I just linked, where clang-built libLLVM.so is called by rustc's src/rustllvm built with GCC.

@sylvestre
Copy link
Collaborator

Tom, do you know how you are going to fix that?
Thanks

@tstellar
Copy link
Collaborator

Tom, do you know how you are going to fix that?
Thanks

I think Han's suggestion for fixing it is fine for trunk.

For our Fedora builds I know we are going to fix it by keeping the ABI the same on gcc builds and changing the ABI on clang builds to match gcc (i.e removing the #if block that only is active for clang).

For the upstream 7.0.1 release, I don't see a way to fix this without breaking the ABI for at least one of the compilers. Since clang is an llvm project, my suggestion for the release_70 branch would be to fix it the opposite of how we will fix it in Fedora, which means to keep the clang ABI the same and change the gcc ABI to match it.

@zmodem
Copy link
Collaborator

zmodem commented Nov 14, 2018

It's confusing to keep track of the changes here.

I believe this is the change that introduced the Clang-GCC ABI difference, by adding the #ifdefs:


commit 933d1a7
Author: Benjamin Kramer [email protected]
Date: Thu Jan 18 16:23:40 2018 +0000

[ADT] Just give up on GCC, I can't fix this.

While the memmove workaround fixed it for GCC 6.3. GCC 4.8 and GCC 7.1
are still broken. I have no clue what's going on, just blacklist GCC for
now.

Needless to say this code is ubsan, asan and msan-clean.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@322862 91177308-0d34-0410-b5e6-96231b3b80d8

IIUC that was the end of a chain of commits trying to make Optional trivially copyable, see #35326 .

The reason for #ifdefs is that several versions of GCC miscompiles #ifdefed out code otherwise.

My suggestion is that for trunk, we back out the whole thing. It's caused more problems than it's solving I think. I believe that means backing out to the state before r322838.

For our Fedora builds I know we are going to fix it by keeping the ABI the same on gcc builds and changing the ABI on clang builds to match gcc (i.e removing the #if block that only is active for clang).

I think removing the clang-only #if block is the correct fix for the branch.

For the upstream 7.0.1 release, I don't see a way to fix this without breaking the ABI for at least one of the compilers. Since clang is an llvm project, my suggestion for the release_70 branch would be to fix it the opposite of how we will fix it in Fedora, which means to keep the clang ABI the same and change the gcc ABI to match it.

If we do this, many versions of GCC will misscompile the code. That's why the #if's were added. So I think removing the clang-only #if block is a better fix, even if it disturbs the ABI more (i.e. changes it for clang-built binaries).

Ben, do you have any input on this?

@d0k
Copy link
Member

d0k commented Nov 14, 2018

We had 3 more attempts at fixing this that broke with random versions of GCC. Let's just revert it, I don't think there are any dependencies on it in trunk at the moment.

It will make clang-tidy emit many more warnings and you can't put llvm::Optional into a union anymore, but that's better than an ABI break.

@sylvestre
Copy link
Collaborator

In Debian & Ubuntu, I am taking the risk to break the ABI.
AFAIK, "only" rustc was impacted

I uploaded a new version of the package with this patch:
https://salsa.debian.org/pkg-llvm-team/llvm-toolchain/blob/7/debian/patches/pr39427-misscompile.diff
It fixed the rust crash.

However, the test case from this bug still fail (there is a small typo in comment #​0)
Should be
g++ -c -opart2.o -DPART2 foo.cpp llvm-config-7 --cxxflags
(instead of g++ -c -opart2.cpp -DPART2 foo.cpp llvm-config-7 --cxxflags)

#​0 0x00000000004005c0 in bar(llvm::Type*, llvm::Constant*, llvm::ArrayRefllvm::Value*, bool, llvm::Optional, llvm::Type*) ()
#​1 0x00000000004005ad in main ()

@tstellar
Copy link
Collaborator

Patch to fix this in trunk: https://reviews.llvm.org/D54540

@tstellar
Copy link
Collaborator

This has been merged to trunk as r346985. For the stable release, what do people think about holding this out of 7.0.1 and then doing another release after 7.0.1 that includes only this patch (either 7.0.2 or 7.1.0), so that users can get all the fixes without the ABI change if they want.

@llvmbot
Copy link
Member Author

llvmbot commented Dec 30, 2018

Ping. What happens now?

@rnk
Copy link
Collaborator

rnk commented Jan 2, 2019

The way I see it, users care about GCC/clang library ABI compatibility, so we need to put out a release with this bug fixed somewhere, even though it technically breaks our ABI stability promise. I seem to recall Tom Stellard suggesting that we put out two simultaneous releases: 7.0.1 and 7.0.2, which are the same, except 7.0.2 takes the patch to revert the ifdefs from llvm::Optional. Personally, this seems like extra complexity, especially because most Linux distros are carrying the patch to revert the llvm::Optional ifdefs anyway. Pulling the revert into 7.0.1 simplifies their lives. Ultimately, Tom is the 7.0.1 release manager, so it is his decision as to whether it is worth the extra complexity and effort to make two releases with two sets of pre-built packages.

@llvmbot
Copy link
Member Author

llvmbot commented Jan 2, 2019

Reid, 7.0.1 is out already. Hence, I'm pinging for some kind of progress here ;-).

There were also suggestions to make it 7.1.0 but I don't know if the progress on removing minor version number didn't already made that impracticable. One thing to note is that we need to change SOVERSION for this release.

@tstellar
Copy link
Collaborator

tstellar commented Jan 7, 2019

I am planning to put this fix into a 7.1.0 release. I will update this issue when the branch is ready for testing/review.

@sylvestre
Copy link
Collaborator

7.1.0 will have only this fix, right?

@tstellar
Copy link
Collaborator

tstellar commented Jan 7, 2019

7.1.0 will have only this fix, right?

Yes, that's the plan.

@serge-sans-paille
Copy link
Collaborator

With llvm::isPodlIke behind replaced by a portable llvmm:is_trivially_copyable, it seems possible to get back to an optimised version of llvm::Optional for trivially copyable types, without ABI issues.

Patch proposed in https://reviews.llvm.org/D57097

@serge-sans-paille
Copy link
Collaborator

This should be fixed as of https://reviews.llvm.org/rL353927

@serge-sans-paille
Copy link
Collaborator

Fixed as of r354264, the attached test case now compiles correctly.

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 10, 2021
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugzilla Issues migrated from bugzilla
Projects
None yet
Development

No branches or pull requests

8 participants