Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for put with signal operation #218

Closed
wants to merge 30 commits into from

Conversation

naveen-rn
Copy link
Contributor

@naveen-rn naveen-rn commented May 25, 2018

This is a work-in-progress proposal to add put with signal operations into the OpenSHMEM standards.
This is related to issue #206. Proposed feature is available as part of Cray SHMEM implementation.

@naveen-rn
Copy link
Contributor Author

naveen-rn commented May 26, 2018

Some PENDING ITEMS that needs further discussions:

  • add nbi support - can we add as a single proposal?
    opening a separate PR to add this feature.
  • for zero size puts - send only the signal is the expected behavior - does Annex C already capture semantics?
    yes, this is captured and we don't have any special text in this proposal for specifcally handling put-with-signal
  • do we need explanation about consecutive puts_signals or put with another put_signal?
    we have added a generic explanation

@naveen-rn naveen-rn requested a review from nspark May 26, 2018 09:26
nspark pushed a commit to nspark/specification that referenced this pull request Jun 8, 2018
nspark
nspark previously requested changes Jun 8, 2018
content/shmem_put_signal.tex Outdated Show resolved Hide resolved
content/shmem_put_signal.tex Outdated Show resolved Hide resolved
content/shmem_put_signal.tex Outdated Show resolved Hide resolved
content/shmem_put_signal.tex Outdated Show resolved Hide resolved
ordering between the delivery of the signal word of a put-with-signal
routine and another data transfer. For example, the delivery of the signal
word in a sequence consisting of a put routine followed by a put-with-signal
routine does not imply delivery of the put routine's data.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably a question that has been asked and answered in other contexts but: are there any atomicity guarantees on the delivery of the signal? Should there be a note here calling that out? If there are no guarantees, should/could there be a separate set of APIs that deliver the signal atomically (possibly as an atomic operation like AND and not just an atomic set)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the answers to your questions lie with the resolution of #204, #229, and the overall memory model work. I think one goal of those efforts should be that shmem_wait_until (and family) can be safely triggered by atomic operations and the signal of a put-with-signal. If that requires the signal to be an atomic remote write, then so be it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To your knowledge, has there been investigation into something like put-with-atomic-op? Semantically a putmem followed by an atomic operation (e.g. AND).

Is there any benefit in hardware to doing something like that, or would implementations just support it as putmem-fence-atomic?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The closest I can think of is the "counting puts" proposal from @jdinan & Co. This is equivalent to a put+fence+atomic-add. That is certainly a useful idiom. I'm not sure whether the full generality of put+fence+atomic-op is useful for all operations (i.e., set, add, and, or, xor).

I definitely think there would be application benefit in having hardware support for these idioms. It is often the case that fence (ordering, not necessarily completion) requires completion, due to network API constraints. If that fence can be avoided because the interconnect manages the ordering and atomic op, then the user is introducing fewer latency-sensitive operations.

@shamisp
Copy link
Contributor

shamisp commented Jul 24, 2018 via email

@nspark
Copy link
Contributor

nspark commented Aug 23, 2018

August F2F discussion:

  • @minsii: Should we add clarifications suggesting shmem_wait_until or shmem_test to clarify portable practices for accessing the signal word?
  • @anshumang: Do we explicitly prohibit the signal word from overlapping with the source or destination buffers?
    • Edit: @nspark suggests prohibiting this overlap via restrict.

@manjugv
Copy link
Collaborator

manjugv commented Aug 23, 2018

August F2F : Ballot postponed.

@naveen-rn
Copy link
Contributor Author

@minsii and @shamisp, I suppose a simple reference to put-with-signal (as in 76a5015) in the list of shmem routines which enforces ordering through p2p_sync operations would clarify the portable practice for using put-with-signal. Let me know, If we need some more clarifications.

@naveen-rn
Copy link
Contributor Author

@nspark @anshumang As modified in 94976fe, I suppose we are planning to restrict the sig_addr buffer to be distinct from source and dest buffer. sig_addr and dest/source must not be same or overlapping data objects.

@naveen-rn
Copy link
Contributor Author

Based on discussions with @anshumang and others -
@nspark It doesn't look like we could get some form of compiler optimizations from the restrict qualifier and also the primary purpose of restrict is to guide users. So, I'm not sure whether we need the restrict qualifier in the API - can just the details in the description section be sufficient?

@naveen-rn
Copy link
Contributor Author

To update, based on RMA WG discussions, we are planning to drop restrict qualifier for put-with-signal and can add it when we go through the whole specification. This is because of the following reasons:

  1. Adding restrict to signal buffer makes sense - but not to both destination and source buffer - that makes it too restrictive
  2. Avoid unnecessary compilation warnings

@minsii
Copy link
Collaborator

minsii commented Sep 23, 2018

@naveen-rn Thanks for the clarification fix with point-to-point sync. I have one more question: why do you define sig_addr with uint64_t type ?

@naveen-rn
Copy link
Contributor Author

naveen-rn commented Sep 23, 2018

@minsii In general, I would prefer a fixed width data type for representing the size of the signal, rather than using generic types which varies with platform. If I'm correct the C standards limits the width to 64. Also, the current implementation in Cray hardware supports only 64 bits for signal - hence we matched that with uint64_t. It can be int64_t - if we prefer the signal to be negative. Are you interested in something different? Say an opaque object like - shmem_sig_t and restrict the size of this opaque object to 64 bits?

1. Use calloc and avoid barriers from malloc and explicit calls
2. use all C11 generic shmem calls
3. Follow shmem bcast semantics - to bcast to source itself
4. convert wavefront-like transfer semantics to true bcast
Previously, we performed bcast example with SHMEM bcast semantics,
without any transfer to PE-0. Now, we perform bcast from PE-0 to
all other PEs and itself.
Performing some quick cleanup on the put-with-signal example.
Based on recent review comments, it looks like it would be more
clear if we state that the signal update is an atomic operation.
We have added this as part of the Notes to Implementers section.
Previously, we had the information about the signal updates atomicity
guarantees in the notes to implementors section for put-with-signal.
We are not now moving this into main notes section. We have also
clarified the atomicity guarantees by refering to atomicty section.
@naveen-rn
Copy link
Contributor Author

naveen-rn commented Jan 11, 2019

Changes made after the October meeting:
https://github.com/openshmem-org/specification/compare/fd54dd3..ada81c9
The following changes were made:

  1. removed restrict qualifier
  2. updated the description section to contain the same explanation from summary
  3. updated context argument explanation
  4. added better explanation for sig_addr and dest overlap
  5. avoided using shmem_malloc and shifted to using shmem_calloc in example
  6. changed the size parameter input in example
  7. modified all type-based shmem routines to C11-generic routines in the example
  8. removed print option from example
  9. clarified that the signal update is an atomic operation

We could use these changes for Jan 2019 OpenSHMEM specification meeting.
[EDIT]: Updated after rebasing with master

@manjugv
Copy link
Collaborator

manjugv commented Jan 28, 2019

January 2019 Meeting: Voting postponed

based on the \VAR{sig\_op} signal operator on the remote \ac{PE} must not
cause partial updates. Only concurrent accesses on \VAR{sig\_addr} by
different signal update operations using the same signal update operator is
guaranteed to be exclusive.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to clarify what "exclusive" means in this context.

@manjugv
Copy link
Collaborator

manjugv commented Apr 11, 2019

@naveen-rn @jdinan As I mentioned in the WG meeting, I think it would be useful for defining partial and complete updates. I created an issue to track this and referenced both tickets that gets impacted.

naveen-rn added 2 commits May 3, 2019 15:59
Move the atomicity semantics to the API description section
Changing the text to confirm the atomicity guarantees of the put
with signal operation. The signal update is atomic only with respect
to itself, and other put-with-signal of the same operator, and any
point-to-point synchronization routines
@jdinan
Copy link
Collaborator

jdinan commented Jul 9, 2019

@naveen-rn Can this one also be closed?

@naveen-rn
Copy link
Contributor Author

Closing this PR as we have opened a new composed PR with both blocking and non-blocking put-with-signal: #275

@naveen-rn naveen-rn closed this Jul 9, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants