-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for put with signal operation #218
Conversation
Some PENDING ITEMS that needs further discussions:
|
Thread Safety Proposal (RM Ticket openshmem-org#218)
content/shmem_put_signal.tex
Outdated
ordering between the delivery of the signal word of a put-with-signal | ||
routine and another data transfer. For example, the delivery of the signal | ||
word in a sequence consisting of a put routine followed by a put-with-signal | ||
routine does not imply delivery of the put routine's data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably a question that has been asked and answered in other contexts but: are there any atomicity guarantees on the delivery of the signal? Should there be a note here calling that out? If there are no guarantees, should/could there be a separate set of APIs that deliver the signal atomically (possibly as an atomic operation like AND and not just an atomic set)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the answers to your questions lie with the resolution of #204, #229, and the overall memory model work. I think one goal of those efforts should be that shmem_wait_until
(and family) can be safely triggered by atomic operations and the signal of a put-with-signal. If that requires the signal to be an atomic remote write, then so be it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To your knowledge, has there been investigation into something like put-with-atomic-op? Semantically a putmem followed by an atomic operation (e.g. AND).
Is there any benefit in hardware to doing something like that, or would implementations just support it as putmem-fence-atomic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The closest I can think of is the "counting puts" proposal from @jdinan & Co. This is equivalent to a put+fence+atomic-add. That is certainly a useful idiom. I'm not sure whether the full generality of put+fence+atomic-op is useful for all operations (i.e., set, add, and, or, xor).
I definitely think there would be application benefit in having hardware support for these idioms. It is often the case that fence (ordering, not necessarily completion) requires completion, due to network API constraints. If that fence can be avoided because the interconnect manages the ordering and atomic op, then the user is introducing fewer latency-sensitive operations.
@nspark ordered network will help but if you take in account something like
adaptive routing it may break it. You really want to have the packet small
enough such that you can squeeze it in single MTU.
…On Tue, Jul 24, 2018 at 11:11 AM Nick Park ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In content/shmem_put_signal.tex
<#218 (comment)>
:
> +\apireturnvalues{
+ None.
+}
+
+\apinotes{
+ The \VAR{dest} and \VAR{sig\_addr} data object must both be remotely
+ accessible, but may each be allocated from the symmetric heap or global/
+ static memory.
+
+ The delivery of \signal{} flag on the remote \ac{PE} indicates only the
+ delivery of its corresponding \dest{} data words into the data object on
+ the remote \ac{PE}. Without a memory-ordering operation, there is no implied
+ ordering between the delivery of the signal word of a put-with-signal
+ routine and another data transfer. For example, the delivery of the signal
+ word in a sequence consisting of a put routine followed by a put-with-signal
+ routine does not imply delivery of the put routine's data.
The closest I can think of is the "counting puts" proposal from @jdinan
<https://github.com/jdinan> & Co. This is equivalent to a
put+fence+atomic-add. That is certainly a useful idiom. I'm not sure
whether the full generality of put+fence+atomic-op is useful for all
operations (i.e., set, add, and, or, xor).
I definitely think there would be application benefit in having hardware
support for these idioms. It is often the case that fence (ordering, not
necessarily completion) requires completion, due to network API
constraints. If that fence can be avoided because the interconnect manages
the ordering and atomic op, then the user is introducing fewer
latency-sensitive operations.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#218 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ACIe2MKlfsy53iL2jt63b6zxKk1JCJEjks5uJ0cYgaJpZM4UOpmX>
.
|
August F2F discussion:
|
August F2F : Ballot postponed. |
@nspark @anshumang As modified in 94976fe, I suppose we are planning to restrict the |
Based on discussions with @anshumang and others - |
To update, based on RMA WG discussions, we are planning to drop
|
@naveen-rn Thanks for the clarification fix with point-to-point sync. I have one more question: why do you define |
@minsii In general, I would prefer a fixed width data type for representing the size of the signal, rather than using generic types which varies with platform. If I'm correct the C standards limits the width to 64. Also, the current implementation in Cray hardware supports only 64 bits for signal - hence we matched that with |
1. Use calloc and avoid barriers from malloc and explicit calls 2. use all C11 generic shmem calls 3. Follow shmem bcast semantics - to bcast to source itself 4. convert wavefront-like transfer semantics to true bcast
Previously, we performed bcast example with SHMEM bcast semantics, without any transfer to PE-0. Now, we perform bcast from PE-0 to all other PEs and itself.
Performing some quick cleanup on the put-with-signal example.
Based on recent review comments, it looks like it would be more clear if we state that the signal update is an atomic operation. We have added this as part of the Notes to Implementers section.
Previously, we had the information about the signal updates atomicity guarantees in the notes to implementors section for put-with-signal. We are not now moving this into main notes section. We have also clarified the atomicity guarantees by refering to atomicty section.
Changes made after the October meeting:
We could use these changes for Jan 2019 OpenSHMEM specification meeting. |
January 2019 Meeting: Voting postponed |
content/shmem_put_signal.tex
Outdated
based on the \VAR{sig\_op} signal operator on the remote \ac{PE} must not | ||
cause partial updates. Only concurrent accesses on \VAR{sig\_addr} by | ||
different signal update operations using the same signal update operator is | ||
guaranteed to be exclusive. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest to clarify what "exclusive" means in this context.
@naveen-rn @jdinan As I mentioned in the WG meeting, I think it would be useful for defining partial and complete updates. I created an issue to track this and referenced both tickets that gets impacted. |
Move the atomicity semantics to the API description section
Changing the text to confirm the atomicity guarantees of the put with signal operation. The signal update is atomic only with respect to itself, and other put-with-signal of the same operator, and any point-to-point synchronization routines
@naveen-rn Can this one also be closed? |
Closing this PR as we have opened a new composed PR with both blocking and non-blocking put-with-signal: #275 |
This is a work-in-progress proposal to add put with signal operations into the OpenSHMEM standards.
This is related to issue #206. Proposed feature is available as part of Cray SHMEM implementation.