Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix clang 13 induced runtime issues #62170

Merged
merged 1 commit into from
Nov 30, 2021
Merged

Conversation

janvorli
Copy link
Member

@janvorli janvorli commented Nov 30, 2021

The clang 13 optimizer started to assume that "this" pointer is always
properly aligned. That lead to elimination of some code that was actually
needed.
It also takes pointer aliasing rules more strictly in one place in jit.
That caused the optimizer to falsely assume that a callee with an argument
passed by reference is not modifying that argument and used a stale
copy of the original value at the caller site.

This change fixes both of the issues. With this fix, runtime compiled
using clang 13 seems to be fully functional.

Close #61671

The clang 13 optimizer started to assume that "this" pointer is always
properly aligned. That lead to elimination of some code that was actually
needed.
It also takes pointer aliasing rules more strictly in one place in jit.
That caused the optimizer to falsely assume that a callee with an argument
passed by reference is not modifying that argument and used a stale
copy of the original value at the caller site.

This change fixes both of the issues. With this fix, runtime compiled
using clang 13 seems to be fully functional.
@janvorli janvorli self-assigned this Nov 30, 2021
@janvorli
Copy link
Member Author

cc @tmds, @omajid

@janvorli janvorli merged commit a47e5d1 into dotnet:main Nov 30, 2021
@janvorli janvorli deleted the fix-clang13-issues branch November 30, 2021 09:55
@omajid
Copy link
Member

omajid commented Nov 30, 2021

I was able to do a full bootstrap with this. Thanks!

@tstellar
Copy link

The clang 13 optimizer started to assume that "this" pointer is always properly aligned. That lead to elimination of some code that was actually needed.

This is part of the C++ language standard, so if clang can't assume that "this" pointer is already aligned, then it means there is bug somewhere else in the program. Just changing the function signature so the alignment information about the pointer is lost may fix the test failure, but I don't think it's solving the underlying problem.

It also takes pointer aliasing rules more strictly in one place in jit. That caused the optimizer to falsely assume that a callee with an argument passed by reference is not modifying that argument and used a stale copy of the original value at the caller site.

Do you have more information about this? It sounds like it could be a clang bug.

@jakobbotsch
Copy link
Member

jakobbotsch commented Nov 30, 2021

This is part of the C++ language standard, so if clang can't assume that "this" pointer is already aligned, then it means there is bug somewhere else in the program.

The difference in this changeset is that we align a void* before casting it to the type with the required alignment. Previously, it was casted and then a member function aligned it.

Do you have more information about this? It sounds like it could be a clang bug.

Let me preface this by saying I believe Clang is in its right to do the optimization it is doing here, and that it is related to strict aliasing (it is fixed with -fno-strict-aliasing).

The best reduced example I could come up with is the following:
https://godbolt.org/z/57TcaG9Ks

With strict aliasing on, Clang13 is compiling Test as if the code was the following:

ShortLongRep val = mJumpDestOut[block->bbNum];
DataFlowD(len, mJumpDestOut[block->bbNum], mJumpDestGen[block->bbNum], block->bbAssertionIn);
bool changed = AreDifferent(len, preMergeOut, block->bbAssertionOut, preMergeJumpDestOut, val);
return changed;

Note that the second arg to DataFlowD is reference that will modify mJumpDestOut[block->bbNum] in some cases. It is not reloaded for the last call to AreDifferent: it uses a stale value, from before DataFlowD was called. That value is left in r8 before the call to AreDifferent.

EDIT: It's easier to see if the loop is commented in DataFlowD: https://godbolt.org/z/je3WM3db9
In clang12 the changed value is allocated into r8 which is the arg register. In clang13 the original value is allocated to r8 and the changed one to rax, and r8 is not reloaded.

@janvorli
Copy link
Member Author

@jakobbotsch thank you for providing the details.
@tstellar let me give you some more info on the "this" pointer alignment issue. We have an in-memory data structure that's loaded from disk or generated. Items of the problematic type are correctly aligned, however when we are computing an address of such an element, we were using an unaligned pointer and then aligning it. We are walking that in-memory structure and locating elements in there. When we move past the previous element, we can get an unaligned pointer for some previous element's size. What we were doing wrong was that we have first cast the pointer to a data structure that needs 8 byte alignment and then called its member method that aligned "this" and returned the aligned result for further use. With this fix, we align the pointer (which is a BYTE*) first and then cast it to the specific type that needs the alignment.

@tstellar
Copy link

@janvorli OK, thanks for the explanation, that makes sense.

@omajid
Copy link
Member

omajid commented Nov 30, 2021

@janvorli Do you have any plans to backport this fix to older release branches? We saw this issue first in 6.0, but even 3.1 and 5.0 would be affected, right?

@janvorli
Copy link
Member Author

janvorli commented Dec 1, 2021

Right, it makes sense to port it. We will also need to port the change that disables the new warnings coming from clang 13.

@omajid
Copy link
Member

omajid commented Dec 16, 2021

Hey, @janvorli is there a PR for the backport I can follow? Shall I create one? There are requests (eg, dotnet/source-build#2645) for getting this into 6.0.2, if possible.

@janvorli
Copy link
Member Author

janvorli commented Jan 3, 2022

@omajid I am sorry for a late response, I was OOF since December 15. I will create a backporting PR. It will need to be combined with a part of another PR that added some compiler warning disabling options for Clang 13.

janvorli added a commit to janvorli/runtime that referenced this pull request Jan 3, 2022
The clang 13 optimizer started to assume that "this" pointer is always
properly aligned. That lead to elimination of some code that was actually
needed.
It also takes pointer aliasing rules more strictly in one place in jit.
That caused the optimizer to falsely assume that a callee with an argument
passed by reference is not modifying that argument and used a stale
copy of the original value at the caller site.

This change fixes both of the issues. With this fix, runtime compiled
using clang 13 seems to be fully functional.
@omajid
Copy link
Member

omajid commented Jan 4, 2022

Hey, @janvorli ! No worries, hope you had a great time off! I was off myself. Thanks for doing the 6.0 PR.

safern pushed a commit that referenced this pull request Jan 4, 2022
* Fix clang 13 induced runtime issues (#62170)

The clang 13 optimizer started to assume that "this" pointer is always
properly aligned. That lead to elimination of some code that was actually
needed.
It also takes pointer aliasing rules more strictly in one place in jit.
That caused the optimizer to falsely assume that a callee with an argument
passed by reference is not modifying that argument and used a stale
copy of the original value at the caller site.

This change fixes both of the issues. With this fix, runtime compiled
using clang 13 seems to be fully functional.

* Fix build with clang 13 (#60328)
@ghost ghost locked as resolved and limited conversation to collaborators Feb 3, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

A self-built crossgen2 hangs on some recent Linux distributions
5 participants