-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for NBI put-with-signal operation #244
Conversation
This comment is no more valid - we didn't have any special ballot voting or updates during the November specification meeting. Updates available in next comment. |
NBI put-with-signal is an extension to its blocking variant.
We have incorporated common review comments from put-with-signal blocking routines: 1. duplicated explanation from summary to description 2. removed restrict qualifier and also overlapping explanation 3. modified ctx arg explanation
Based on recent review comments, it looks like it would be more clear if we state that the signal update is an atomic operation We have added this as part of the Notes to Implementers section.
Previously, we had the information about the signal updates atomicity guarantees in the notes to implementors section for NBI put-with-signal We are not now moving this into main notes section. We have also clarifies the atomicity guarantees by refering to atomicty section.
a5b02ae
to
3b48d2b
Compare
Changes made after the October 2018 Meeting: The following changes were made similar to #218 (comment):
October 2018 is when we had the official reading of this ticket. These changes can be used for updates in the January 2019 meeting and used for special ballot. |
Previously, we used restrict qualifier and defined in the def.tex for syntax highlighting in the function definitions. As the usage of restrict qualifier is removed, this change is no longer nedeed.
We were incorrectly using variable and macros incorrectly for \dest and \source. Fixing it in put-with-signal-nbi.
content/shmem_put_signal_nbi.tex
Outdated
point-to-point synchronization interfaces. The delivery of \VAR{signal} flag | ||
on the remote \ac{PE} must not cause partial updates. This requires the | ||
update on \VAR{signal} flag to be an atomic operation, with atomicity | ||
guarantees described in Section~\ref{subsec:amo_guarantees}. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This text is too strong. We don't want the signal update to be a SHMEM-level atomic. If a network can implement this with a simple write, it should be allowed. Suggest something like:
"The put-with-signal interfaces must be implemented such that a synchronization operation on the remote \ac{PE} does not observe partial updates to \VAR{signal}. On some platforms, this may require the update of the \VAR{signal} flag to be performed using an atomic operation."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jdinan @manjugv @shamisp Though the comment is shown outdated - this one is not fixed. Need some feedback.
To me the core of both @jdinan suggested changes (prefer - atomic-like semantics) and @shamisp review comments (prefer - strong atomic guarantees) almost looks same. But, the issue is - we are trying to determine an implementation detail.
If some implementation can implement shmem_atomic_set
as simple shmem_p
operation. It doesn't matter if this statement is too strong. Internally it is going to still go through the same path.
IMHO, if it is fine, we should completely remove the implementation detail. We should just say:
The put-with-signal interfaces must be implemented such that a synchronization operation on the remote \ac{PE} does not observe partial updates to \VAR{signal}.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still prefer the minimally specified "is compatible with remote wait/test" text. It's crucial for open standards to keep the implementation space open to support future and alternative architectures (e.g. Gen-Z, CCIX, NVLink, etc.) where the signal can be non-atomic. SHMEM on shared memory (e.g. POSH) is another example, depending on the underlying architecture.
My understanding is that the weaker text is compatible with current architectures, while keeping the implementation space open. I appreciate that the atomic text is clearer to present day implementors and, as a result, is more likely to be implemented correctly. Apart from clarity, is there something else I'm missing? If the preference for the atomic text is being driven by clarity, we can put more work into the wait-compatible version of the text to address this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the last RMA call we had in depth discussion about definition signal semantics, notes posted here http://www.openshmem.org/pipermail/openshmem-list/2019-January/001021.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shamisp Sorry I wasn't able to join that meeting. I re-read the notes, but don't see the outcome from the discussion. Can someone please clarify for me?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bcernohous That's my understanding as well. There is an additional concern we are working through that on some platforms the signal update needs to be performed using a PCIe atomic operation (and by extension a NIC atomic) to ensure that shmem_wait
at the target doesn't see a partial update to the variable it is waiting on. Since atomics have lower throughput than puts (at least on the networks we are discussing), there is interest in identifying a path through the API where such a network can still use puts to implement the signal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jdinan , understood. And there was some discussion of doubling the APIs to support both"no partial updates" and "partial update ok" versions of wait.
IMO, before I've seen the upcoming "shmem signal" (not shmem_put_signal) proposal, I think SHMEM takes the hard path. No partial updates.
An app that wants performance and tolerates partial updates could use shmem_put/do while(flag!=0). It's not clear to me that we need API's for that. My 2 cents.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bcernohous The question @naveen-rn asked earlier in the thread was whether the signal update performed by put-with-signal should be specified as a SHMEM AMO or a "no partial updates" update. What's your preference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jdinan , I'm waiting to see the new shmem_signal proposal. I think we define them similarly and they both satisfy shmem_wait. So it partly depends on what we decide for shmem_wait.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This signal API sounds a bit like a wrapper around shmem_p
and shmem_atomic_set
(i.e. choose which ever one is fastest and safe to use with wait). We could also add a preprocessor macro SHMEM_P_IS_WAIT_SAFE
and let the user code for this if they want to.
January 2019 Meeting: Voting postponed |
@naveen-rn Can this PR be closed now that it is merged with #275? |
Closing this PR as we have opened a new composed PR with both blocking and non-blocking put-with-signal: #275 |
This PR extends the #218 proposal adding non-blocking support for put-with-signal operations. The nonblocking variant of the put-with-signal operation is semantically equivalent to:
shmem_put_nbi();
shmem_fence();
shmem_put_nbi();
Following the trend of no-example for any NBI operations, there is no example added for this proposed routine in this PR. And the placement of the routine, will also be modified based on Issue #216.
Expected DOC changes: