-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What about: mixed-size atomic accesses #345
Comments
That all matches my understanding as well. (It's part of the reason why I added
If they would disagree with that, that'd basically imply that after using some memory for an atomic operation, you can never re-use that memory again. E.g. deallocating a Box would be unsafe, and so would be a stack-allocated AtomicU16 that goes out of scope. They don't say it very clearly, but I don't see how their no-mixed-sizes rule can apply to anything other than atomic operations on the same memory that race with each other.
Yeah, it could very well turn out that "should" just means "for performance", and that it has nothing to do with correctness. They're not very clear.
That seems like exactly the right thing to say, and matches what you can do in safe Rust (if we include the unstable I don't think it's impossible that this might be less restrictive in the future, if we find more reasons to believe that racing mixed-size atomic operations will work on all platforms.
Converting In C++20, you can also have a In atomics.ref.generic#general-3, they clearly specify mixed-size accesses in the same way as us:
(Emphasis mine.) |
ARM's memory model (in the section: The AArch64 Application Level Memory Model) seems to fully support mixed-sized atomic accesses. |
Ah, good point. I had forgotten that hardware memory models do not have provenance. 😂
.. and include bytemuck
Isn't that a C thing? Though C++ might have something similar with
Oh, good point. So in some sense this is actually already all covered by rust-lang/rust#97516. |
When reusing memory it is undef, right? Furthermore deallocation requires some kind of synchronization with every thread that has ever accessed it using atomic operations. Together I would assume this is enough to provide consistency by "resetting" the state witnessing that it was accessed using atomic operations of a different size. |
C++ allows aliasing through
To add to my previous comment: one of the reasons why C++'s atomic_ref doesn't allow mixed size / overlapping operations, is that it supports objects of any size. If it gets too big for native atomic instructions, it uses a mutex instead, which is probably stored in some kind of global table indexed by the address of the object. It's not completely clear whether it's necessary to be as restrictive when limited to only natively supported atomic operations, like in Rust. |
I thik we can interpet "should" as "It's undefined, by spec, though it works in practice b/c some important people rely on it, but please don't, we want to do fast things". Therefore, the mixed size access should be considered undefined by Rust, as we expect to be able to compile to x86 where it is undefined. |
Regarding x86, I got the following from @thiagomacieira which is very helpful
|
So this means there are atomic 256bit accesses but doing size mixing with those is a problem? (Not sure what "1-uop operations are.) |
On Mon, Apr 10, 2023 at 13:18 Ralf Jung ***@***.***> wrote:
use operations of 16 bytes or less (ideally: use only 1-uop operations)
So this means there are atomic 256bit accesses but doing size mixing with
those is a problem? (Not sure what "1-uop operations are.)
1-uop operations are operations that compile down to an instruction that
does a single microinstruction. I can't remember if sse movs are 1-uop, but
this would be all of the scalar instructions for sure.
… —
Reply to this email directly, view it on GitHub
<#345 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABGLD24JO46ZNARBB6FYS23XAQ6E5ANCNFSM52OFYUCQ>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
SSE loads and stores are atomic. AVX 256- and 512-bit loads and stores are atomic on P-core processors, but not E-core (the 256-bit operation is cracked into two 128-bit operations and therefore not atomic). There s no RMW SIMD. The best you get is a merge-store and I'm confident that's atomic on P-core, but I doubt so on E-core. Therefore, SIMD atomics are very limited if you can only use them for loads and stores. The most useful thing that this could be done for is to load 16 bytes atomically, and use CMPXCHG16B to store, but it's still limited and somewhat slow due to the transfer between register files. seqlocks are more flexible. |
My reading of the RISC-V spec says that mixed-size atomics are well-defined if they are aligned, both in terms of their atomicity and in terms of their globally observable ordering with respect to other load/store operations in the same thread. It took quite a lot of context to convince myself that this is true, so I'm leaving notes below, but the tl;dr is "mixed size atomics are fine, as long as they are aligned". Atomicity:
Ordering (emphasis mine):
Context: Here, AMO is an atomic modify operation such as fetch_add or fetch_xor, and "hart" is a hardware thread. RISC-V does not implement CAS directly but uses paired LR (load-reserved) and SC (store-conditional) instructions, which can be used to implement CAS. Note that there is a discrepancy described in section A.7.1 between their axiomatic (normative) and operational memory models, regarding mixed-size load and store operations. The examples provided all look like edge cases to me and I do not believe they have any bearing on guarantees provided by the C++ atomic model (even if the model were extended to cover mixed-size atomics). As long as you rely only on atomics being atomic and on acquire/release (or stronger) semantics providing happens-before relationships, you could not write code that depends on this discrepancy being resolved one way or another. |
There are plenty of hardware models that allow mixed-size atomics. That's entirely useless for purpose though; Rust needs a language-level memory model and for those, so far I don't know of any proposal that would permit mixed-size atomics. Hardware-level memory models disallow many optimizations we'd want to do (such as GVN, hoisting code, or algebraic identities that remove syntactic dependencies like Trying to use a hardware memory model in Rust doesn't work for the usual reason: what the hardware does is not what your program does. Rust programs run on an Abstract Machine that has its own rules, and that is intended to support way more optimizations than what you can do on assembly, and the cost for that is that the AM is a lot harder to define. |
Yes, Rust cannot just directly use a hardware memory model, and I do not think tmandry was suggesting it should, but it is still necessary to discuss hardware memory models to determine whether or not a possible future Rust memory model allowing mixed-size atomics is even implementable on those platforms. |
Don't worry! I promise to forever believe in the Rust abstract machine. I also think we have a precedent for "poking holes" in Rust's abstract machine by saying the behavior of certain operations is "whatever the underlying hardware does – within (important) constraints" – correct me if I'm wrong, but floating point comes to mind. Currently, I don't see any insurmountable obstacles to doing that for mixed-size atomic ops. It's very unlikely that LLVM takes advantage of the UB given that actual C/C++ code in the wild makes use of them, and so far I can't imagine a use case that allows us to do an important optimization (at least, the importance of such an optimization would likely be smaller than the importance of using mixed-size atomics). Examples of usage in the wild:
While we could tell everyone they should go down to assembly for this, it would be much nicer – and probably more optimization-friendly – if we could let them use Rust atomics. |
Floating-point operations are trivial pure operations, so poking a hole is trivial in principle -- and even there it is already going horribly wrong, see the x86-32 issues. The memory model is probably the most complicated part of the Rust specification. I don't know of any way that is even remotely principled that would let us "poke a hole". So no, I don't think we can do that. If you want to use target semantics, you have to use inline asm (and even then you have to ensure that the end-to-end behavior of the inline asm can be expressed in Rust). |
Yeah, I didn't mean to suggest this was trivial, and your comment makes me realize that "poking a hole" is definitely the wrong framing. I would very much like it if we could extend the memory model (possibly in target-specific ways) to support these operations, but I don't know how to do that and I'm sure some smart people have tried. On the bright side, we may have ourselves a great idea for a PhD thesis... |
So it looks like mixed-size-accesses made the round on twitter again recently which got me thinking about them. rust-lang/rust#97516 carefully chose not to talk about them (and anyway it's not clear that is the right place to go into more detail about them). So what do we want to do with them in Rust?
Some facts:
In C++, you cannot convert a
&uint16_t
into an reference to an array "because no such array exists at that location in memory"; they insist that memory is strongly typed. This means they don't even have to talk about mixed-size accesses. It also means they are ignoring a large fraction of the programs out there but I guess they are fine with that. We are not. ;)Apparently the x86 manual says you "should" not do this: "Software should access semaphores (shared memory used for signalling between multiple processors) using identical addresses and operand lengths." It is unclear what "should" means (or what anything else here really means, operationally speaking...)
In Rust, it is pretty much established that you can safely turn a
&mut u16
into a&mut [u8; 2]
. So you can do something where you start with a&mut AtomicU16
, do some atomic things to it, then use the above conversion and get a&mut AtomicU8
and do atomic things with that -- a bona fide mixed-size atomic access.However, this means that there is a happens-before edge between all the 16-bit accesses and the 8-bit accesses. Having the
&mut
means that there is synchronization. I hope this means not even Intel disagrees with this, but since literally none of the words in that sentence is defined, who knows.So... it seems like the most restrictive thing we can say, without disallowing code that you can already write entirely safely using bytemuck, is that
Cc @workingjubilee @m-ou-se @Amanieu @cbeuw @thomcc
The text was updated successfully, but these errors were encountered: