-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug in atomic operations on aarch64 with multi-threading #13010
Comments
l = Crystal::SpinLock.new
q = Deque(Int32).new(initial_capacity: 10)
200.times do |i|
spawn do
loop do
l.lock
if q.size < 10
q << i
end
if q.@capacity > 10
raise "BUG: q.@capacity=#{q.@capacity}"
end
l.unlock
Fiber.yield
end
end
end
spawn do
loop do
l.lock
if i = q.shift?
puts i
end
l.unlock
Fiber.yield
end
end
sleep This also reproduces the bug. It won't crash with invalid memory access, but it shows that we've managed to insert an extra item in the |
Just to bring it to the protocol: this also affects e.g. |
I took a look at this after a conversation with @bcardiff. The atomic implementation is fine from what I understand, but the problem is the compiler does not emit memory barriers at the lock/unlock points. This means the atomic operations are executed correctly and the sequential consistency is maintained for them, but not for the rest of the memory. Since ARM has a weak consistency memory model, there is nothing preventing the CPU from reordering the final write to the Warning: I don't have a ARM processor at hand to try this. All of what I'm saying is inferred from reading the linked LLVM document and these additional two posts and inspecting the assembler output for the given test program. Just saying I may be way off 😁 A blunt approach to fixing this would be to insert an acquire-release memory barrier right before the unlock. From what I can see that does generate a l = Crystal::SpinLock.new
q = Deque(Int32).new(initial_capacity: 10)
@[Primitive(:fence)]
def fence(ordering : LLVM::AtomicOrdering, singlethread : Bool) : Nil
end
200.times do |i|
spawn do
loop do
l.lock
if q.size < 10
q << i
end
if q.@capacity > 10
raise "BUG: q.@capacity=#{q.@capacity}"
end
fence :sequentially_consistent, false # <---- HERE
l.unlock
Fiber.yield
end
end
end
spawn do
loop do
l.lock
if i = q.shift?
puts i
end
fence :sequentially_consistent, false # <---- HERE
l.unlock
Fiber.yield
end
end
sleep A better approach would be to add memory ordering modifiers to the I'm not sure if there is anything to be tweaked for the LLVM backend. The emitted assembler looks sound from a semantic point of view regarding the atomic behavior. It uses |
🎉 it seems to work here. A simpler monkey patch would be class Crystal::SpinLock
def unlock
::Atomic::Ops.fence :sequentially_consistent, false
previous_def
end
end Anyone else wants to validate? |
Nice! A couple more things to consider:
|
Before the set, right? Or after it? |
After obtaining the lock, after the outer |
Reading the ARM and Lock-Free Programming article that @ggiraldez linked, I understand that Isn't |
Yeah, I thought about that too. But |
I believe all of our atomic operations are already sequentially consistent since that is what we pass to LLVM, the culprit may be the use of Also |
It's true that atomic operations are sequentially consistent. But for ARM at least, that only ensures sequential consistency on the atomic themselves. Memory barriers are required to synchronize the rest of the memory. I don't think the use of |
I'm wondering again if Reading into the linked articles again, I don't understand the requirement for an explicit memory barrier while we already have the explicit sequentially consistent memory order in the atomics... Both articles talk about MSVC that does put generate explicit memory barriers for AArch64. I built Aarch64 binaries using Exploring the internals of musl-libc the atomics use the same set of instructions (LDAXR/STLXR/CNBZ) but does set an explicit memory barrier (DMB ISH) in atomic CAS for AArch64. The SpinLock in the linux kernel for ARM64 explicitly states:
The difference is as @HertzDevil is pointing: the That being said, it seems we should have a memory barrier on ARM (v6 and v7). |
There seems to be a bug with atomic operations in multi-threaded mode (
-Dpreview_mt
) on aarch64.It was discovered through use of atomic operations in
Crystal::SpinLock
for guardingChannel
access.This is a reproducible (yet not very minimalized) example that should fail due to invalid memory access:
This bug is confirmed to appear on
aarch64-apple-darwin
as well asaarch64-unknown-linux-gnu
(although I have not been able to reproduce it on linux yet). x86 architectures seem to be unaffected.As a workaround we can useThread::Mutex
instead ofSpinLock
forChannel
. This has of course an impact on performance.A potential angle for a fix could be this: https://www.llvm.org/docs/Atomics.html
However, it's not clear if and what we would need to do for that as it's understood this would be handled implicitly by LLVM.
The text was updated successfully, but these errors were encountered: