-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Blocking fetching AMOs are unaffected by fence/quiet #518
Conversation
There is actually an unresolved memory model issue with this proposal. Specifically, given a litmus test like this:
and assuming no overflow, do we require the assertion to be true? Relaxed order atomics (e.g. |
Yeah, great question. I might agree that we do not require the assertion to be true in this example (at least if relaxed atomics are allowed) but this proposal says that a fence here does not fix it regardless. So if we do want a fence to force the ordering, we should reconsider this proposal and/or fix up the memory model first. I think a related confusion is this sentence in 9.7:
so a naive user might think the 2nd call isn't even initiated until Also, just to clarify/simplify this example, can we assume that there are no other atomics happening on Pinging @wrrobin and @stewartl318 who might be interested and will have to cover for me at the ballot (I'll be chairing the HOTI tutorials ;). |
Given that this is a minor change that treads into the space of clarifying unspecified behavior, I'm tempted to ask that we table it for OpenSHMEM 1.6 so that we can try to address the issue in the memory model holistically. I suspect that we'll eventually want to define blocking fetching AMOs as ordered (this will be painful on GPUs) and introduce new AMOs that allow users to choose the ordering model (i.e. can choose relaxed for performance). What do you think? |
@jdinan - AMOs with user options for choosing ordering and scope is definitely a good idea. But, even for those, the default option needs to be clarified. Are you suggesting to use fence/quiet for both blocking and non-blocking fetching AMOs to ensure ordering in the default case? |
I am firmly on the side of having the default behavior be relaxed. First of all, Jim's example could be implemented by fetch_and_add(2), and then the caller can decide the ordering. Second, my (dim) understanding of the memory models is that the atomics with ordering (seq_cst, release, acquire, acquire-release) have to do with making sure that data buffers are written before an associated flag (release) and that a flag is checked before a buffer is read (acquire). They only really apply to the sequencing of operations from a particular source. We already have a flag API (put with signal) and that one should be "release". We also have shmem_fence to enforce order of delivery. My suspicion is that most individual non-fetching AMOs are there to support lock-free modifications of remote buffers, and the lock freedom is against other PEs, hardly ever against yourself. Relaxed is the most performant choice. For fetching atomics, it seems likely to me that again the use case is almost always contention with other PEs not with yourself, so again, the ordering doesn't matter. When ordering of atomics from the same pe matters, it will (usually) be to different addresses, when you wish the target to see the updates in a particular order. shmem_fence does that. I am perfectly comfortable with the idea that shmem doesn't enforce ordering of fetching atomics and I certainly wouldn't want to slow down the common cases in support of such a corner case. |
Thanks @stewartl318 for your thoughts. This sounds good to me as well. But, how do we clarify it in the spec? Should we then update the AMO section 9.7 text as Dave pointed above? |
@stewartl318 I think that's a very reasonable proposal. It may break backwards compatibility and we'll need to wrestle with that. Regarding this proposal, my understanding is it's being put forward as a simple bugfix to a table that's believed to be inconsistent. IIUC, the goal is not to clarify the memory model. However, any change we make here will have memory model implications/complications. So my suggestion is that we take some additional time to craft a proper clarification to the memory model and delay this change to the OpenSHMEM 1.7. |
Summary of changes
This clarifies that blocking fetching AMOs are unaffected by fence/quiet, but blocking non-fetching AMOs are affected by fence/quiet. Previously this was a bit unclear from Table 14 in OpenSHMEM v1.5.