Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for NBI put-with-signal operation #244

Closed

Conversation

naveen-rn
Copy link
Contributor

This PR extends the #218 proposal adding non-blocking support for put-with-signal operations. The nonblocking variant of the put-with-signal operation is semantically equivalent to:

shmem_put_nbi();
shmem_fence();
shmem_put_nbi();

Following the trend of no-example for any NBI operations, there is no example added for this proposed routine in this PR. And the placement of the routine, will also be modified based on Issue #216.

Expected DOC changes:

  1. Since the change log entry framework and the keyword highlighting changes are added as part of Add support for put with signal operation #218 - there is no need to do it again in this PR. We will clean it up after reading.

@naveen-rn
Copy link
Contributor Author

naveen-rn commented Nov 1, 2018

Changes made after the October 2018 Meeting:
https://github.com/openshmem-org/specification/compare/8e386..1ce6d5

The following changes were made similar to #218 (comment):

  1. removed restrict qualifier
  2. updated the description section to contain the same explanation from summary
  3. updated context argument explanation
  4. added better explanation for sig_addr and dest overlap

October 2018 is when we had the official reading of this ticket. These changes can be used for updates in the November 2018 meeting and used later for special ballot.

This comment is no more valid - we didn't have any special ballot voting or updates during the November specification meeting. Updates available in next comment.

@naveen-rn naveen-rn changed the title Add support for NBI put-with-signal Add support for NBI put-with-signal operation Jan 8, 2019
naveen-rn and others added 6 commits January 11, 2019 12:33
NBI put-with-signal is an extension to its blocking variant.
We have incorporated common review comments from put-with-signal
blocking routines:

1. duplicated explanation from summary to description
2. removed restrict qualifier and also overlapping explanation
3. modified ctx arg explanation
 Based on recent review comments, it looks like it would be more
 clear if we state that the signal update is an atomic operation
 We have added this as part of the Notes to Implementers section.
Previously, we had the information about the signal updates atomicity
guarantees in the notes to implementors section for NBI put-with-signal
We are not now moving this into main notes section. We have also
clarifies the atomicity guarantees by refering to atomicty section.
@naveen-rn naveen-rn force-pushed the feature/put-signal-nbi branch from a5b02ae to 3b48d2b Compare January 11, 2019 18:33
@naveen-rn
Copy link
Contributor Author

naveen-rn commented Jan 11, 2019

Changes made after the October 2018 Meeting:
https://github.com/openshmem-org/specification/compare/2b39c09..48f2ec1

The following changes were made similar to #218 (comment):

  1. removed restrict qualifier
  2. updated the description section to contain the same explanation from summary
  3. updated context argument explanation
  4. added better explanation for sig_addr and dest overlap
  5. clarified NBI signal update as an atomic operation

October 2018 is when we had the official reading of this ticket. These changes can be used for updates in the January 2019 meeting and used for special ballot.
[EDIT]: Updated after rebasing with master

Previously, we used restrict qualifier and defined in the def.tex
for syntax highlighting in the function definitions. As the usage
of restrict qualifier is removed, this change is no longer nedeed.
We were incorrectly using variable and macros incorrectly for \dest
and \source. Fixing it in put-with-signal-nbi.
content/shmem_put_signal_nbi.tex Outdated Show resolved Hide resolved
point-to-point synchronization interfaces. The delivery of \VAR{signal} flag
on the remote \ac{PE} must not cause partial updates. This requires the
update on \VAR{signal} flag to be an atomic operation, with atomicity
guarantees described in Section~\ref{subsec:amo_guarantees}.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This text is too strong. We don't want the signal update to be a SHMEM-level atomic. If a network can implement this with a simple write, it should be allowed. Suggest something like:

"The put-with-signal interfaces must be implemented such that a synchronization operation on the remote \ac{PE} does not observe partial updates to \VAR{signal}. On some platforms, this may require the update of the \VAR{signal} flag to be performed using an atomic operation."

Copy link
Contributor Author

@naveen-rn naveen-rn Jan 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jdinan @manjugv @shamisp Though the comment is shown outdated - this one is not fixed. Need some feedback.

To me the core of both @jdinan suggested changes (prefer - atomic-like semantics) and @shamisp review comments (prefer - strong atomic guarantees) almost looks same. But, the issue is - we are trying to determine an implementation detail.

If some implementation can implement shmem_atomic_set as simple shmem_p operation. It doesn't matter if this statement is too strong. Internally it is going to still go through the same path.

IMHO, if it is fine, we should completely remove the implementation detail. We should just say:

The put-with-signal interfaces must be implemented such that a synchronization operation on the remote \ac{PE} does not observe partial updates to \VAR{signal}.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still prefer the minimally specified "is compatible with remote wait/test" text. It's crucial for open standards to keep the implementation space open to support future and alternative architectures (e.g. Gen-Z, CCIX, NVLink, etc.) where the signal can be non-atomic. SHMEM on shared memory (e.g. POSH) is another example, depending on the underlying architecture.

My understanding is that the weaker text is compatible with current architectures, while keeping the implementation space open. I appreciate that the atomic text is clearer to present day implementors and, as a result, is more likely to be implemented correctly. Apart from clarity, is there something else I'm missing? If the preference for the atomic text is being driven by clarity, we can put more work into the wait-compatible version of the text to address this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the last RMA call we had in depth discussion about definition signal semantics, notes posted here http://www.openshmem.org/pipermail/openshmem-list/2019-January/001021.html

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shamisp Sorry I wasn't able to join that meeting. I re-read the notes, but don't see the outcome from the discussion. Can someone please clarify for me?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bcernohous That's my understanding as well. There is an additional concern we are working through that on some platforms the signal update needs to be performed using a PCIe atomic operation (and by extension a NIC atomic) to ensure that shmem_wait at the target doesn't see a partial update to the variable it is waiting on. Since atomics have lower throughput than puts (at least on the networks we are discussing), there is interest in identifying a path through the API where such a network can still use puts to implement the signal.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jdinan , understood. And there was some discussion of doubling the APIs to support both"no partial updates" and "partial update ok" versions of wait.

IMO, before I've seen the upcoming "shmem signal" (not shmem_put_signal) proposal, I think SHMEM takes the hard path. No partial updates.

An app that wants performance and tolerates partial updates could use shmem_put/do while(flag!=0). It's not clear to me that we need API's for that. My 2 cents.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bcernohous The question @naveen-rn asked earlier in the thread was whether the signal update performed by put-with-signal should be specified as a SHMEM AMO or a "no partial updates" update. What's your preference?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jdinan , I'm waiting to see the new shmem_signal proposal. I think we define them similarly and they both satisfy shmem_wait. So it partly depends on what we decide for shmem_wait.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This signal API sounds a bit like a wrapper around shmem_p and shmem_atomic_set (i.e. choose which ever one is fastest and safe to use with wait). We could also add a preprocessor macro SHMEM_P_IS_WAIT_SAFE and let the user code for this if they want to.

@manjugv
Copy link
Collaborator

manjugv commented Jan 28, 2019

January 2019 Meeting: Voting postponed

@jdinan
Copy link
Collaborator

jdinan commented Jul 9, 2019

@naveen-rn Can this PR be closed now that it is merged with #275?

@naveen-rn
Copy link
Contributor Author

Closing this PR as we have opened a new composed PR with both blocking and non-blocking put-with-signal: #275

@naveen-rn naveen-rn closed this Jul 9, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants