-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not use full memory barrier for osx/arm64 #71026
Conversation
Tagging subscribers to this area: @dotnet/gc Issue DetailsMacOS M1+ are on Arm v8.5 which has support for atomic instructions, and we don't have to emit full barrier if that is the case. Refereces:
|
__sync_synchronize(); | ||
#endif | ||
#else | ||
// For OSX Arm64, the default Arm architecture is v8.1 which uses atomic instructions that don't need a full barrier. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the C/C++ compiler guaranteed to use the newer atomic instructions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR should add an explicit flag for clang to use arm8.1 or e.g. mcpu=apple-m1
Currently, it relies on my observations that by default Clang targets >Arm 8.0 on M1 but if Apple decides to change the default internally we might end up in a situation where these compiler intrinsics will be lowered to 8.0 and without the memory barrier = potential non-reproduceable race conditions somewhere in the vm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does passing in mcpu=apple-m1
guarantee that the compiler is only ever use the new instructions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LLVM maps apple-m1
to ARMV8_5A
as seen in https://github.com/llvm/llvm-project/blob/5ba0a9571b3ee3bc76f65e16549012a440d5a0fb/llvm/include/llvm/Support/AArch64TargetParser.def#L256-L257. However, I think the concern is valid and the full proof way to address it is to check explicitly the way it is done for windows counterpart in #70921. I am working on PR that will add similar check for linux-arm64 (reason stated in #70921 (comment)), so it should take care of these things for osx as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think inline asm solves all problems here (might be tricky with templates)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively we can write a small test that validates that the intrinsic is lowered into LSE 🤷
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think you can reliably test for this. For example, you may see the old instruction only when there is a certain addressing mode needed or only when the code is cold.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it use casal in debug? Can it switch to old LL/SC helper because of register pressure or if the old implementation is one day found faster (it could be).
It feels like inline asm could have more reliable guarantees.
Replaced by #71512 |
MacOS M1+ are on Arm v8.5 which has support for atomic instructions, and we don't have to emit full barrier if that is the case.
Refereces: