-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fenced load in bounded queue / v1.2.3 #17
Comments
Also, the Another thing to consider is whether a |
I thought the purpose of that fence was for subsequent relaxed loads (because there were cases where there was no preceding relaxed operation. e.g., pop in the unbounded queue). However, #16 is maybe wrong for the bounded queue because the bounded queue has a preceding relaxed operation. So, I've yanked 1.2.3 for now. (The relevant commits are crossbeam-rs/crossbeam@7d3f7f0 (bounded) and crossbeam-rs/crossbeam@05554e6 (unbounded).)
The table under "Note: there is an alternative mapping of C/C++11 to x86..." says that mfence+mov works as a SeqCst load.
You are right.
LLVM also uses |
I see, but the fact that a
But it looks like this was a miscompilation? Or did this bug already make it into rustc? [Edit] OK, I tested it on godbolt and it looks like the optimization (or miscompilation if it is one) is not an issue for |
Regarding inline assembly: my own experience was that in practice it actually produced slower code than simply using a fence, because the ASM was treated as a black box by the compiler and too many optimization were prevented. |
Is this compositional though? As in, when mixing both mappings, does the result still behave as intended? With these mappings it is sometimes the case that the entire program needs to use the same mapping, or else things go wrong. |
I don't know if we are not barking up the wrong tree: I don't think the original code was intended to work as a SeqCst load but rather as a full SeqCst barrier, which does much more (for an example see #18 (comment)). Unfortunately the above thread does not directly answer the concern about mixing |
I believe this is true in the case where seqcst store is mapped as mov without fence, otherwise it should not be a problem. fence+mov is also proposed for optimization of the idempotent RMW on x86 in https://www.hpl.hp.com/techreports/2012/HPL-2012-68.pdf, Section 7, and LLVM actually implements it. |
After looking a bit more at the bounded queue, I have the feeling that the fences are meant to prevent spurious reports of empty (pop) or full (push) queues when a concurrent push (resp. pop) operation is ongoing. Taking the example of That's just a theory though, these things are awfully tricky so I may very likely be wrong... |
Note that even though this may have been an optimization back then (these links are dated from 2012 and 2014), now the situation seems to have reversed: in most cases an idempotent RMW will be faster than an
|
I have commented on the relevant commit (#16), but too late as the commit was merged so I thought I would open an issue so this can be tracked.
The text was updated successfully, but these errors were encountered: