-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Are reductions as safe as intended? #74
Comments
Reductions return the result at all PEs, so every PE must wait to return until the result is generated. Thus, it is impossible for the |
Thus there is an implicit barrier at the end of every reduction? Shouldn't this be stated somewhere in the spec to reduce the potential implementation space accordingly? |
Hi Olaf,
This is indeed an ambiguity in OpenSHMEM 1.3. We have ratified a proposal
that will resolve the ambiguity in the OpenSHMEM 1.4 specification. The
following new text will be added to the reductions section:
"Upon return from a reduction routine, the following are true for the local
PE: The dest array is updated and the source array may be safely reused."
~Jim.
…On Tue, Jun 6, 2017 at 9:33 AM, krzikalla ***@***.***> wrote:
Thus there is an implicit barrier at the end of every reduction? Shouldn't
this be stated somewhere in the spec to reduce the potential implementation
space accordingly?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#74 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ADPX8ukWlPk9qK-AkGtvIKefY_SVGn0rks5sBXFNgaJpZM4NxdGd>
.
|
I'm not sure, whether the 1.4 spec change resolves this issue. The new change seems to clarify about using the buffer by local PE.
In his 2-PE example, this is correct. Let's say we have 4 PEs participating in the reduction and if PE-1 and PE-3 returns from the reduction, while PE-0 and PE-2 still computes the reduction. Then, either PE-1/PE-3 can still modify the source/dest buffer on PE-0/PE-2. FWIU there is no implicit barrier after all-reduce. It is users responsibility to add an active-set based barrier to achieve his usage. |
Ahh, I didn't read the example closely enough. Yes, according to the specification that is a race. Completion of the reduction at PE 1 does not guarantee completion at any other PE. This is a race even for two PEs, since OpenSHMEM does not define an ordering between operations performed by PE 1 within the reduction (e.g., which could be bounce buffered and converted to non-blocking) and the subsequent put. |
Yes, you are correct - even for 2 PEs this is a race. |
If reductions deliver result to all PEs, there's an all-to-all data dependency, which is equivalent to an execution barrier, no?
Only way this isn't true is if logical collectives cheat and do early exit after first zero is observed for AND and first non-zero is observed for OR.
|
@jeffhammond: If the implementation of a reduction calculates it at a particular place, then it's actually an all-to-one followed by an one-to-all data dependency, isn't it? IMHO even then one PE could receive the result and proceed before another PE. @jdinan: This clarification is a good start. I think, there is still a statement about remote memory accesses needed. Something like this: "Accessing memory involved in a collective routine while the PE is processing that collective, results in undefined behavior. Since PEs can enter and exit collectives at different times, accessing such memory remotely requires some additional synchronization with the corresponding remote PE." I guess, someone can rephrase it better. |
@jeffhammond Exit from all-reduce implies that all processes have reached the call to all-reduce, but it doesn't carry the barrier guarantee of ordering/completion of pending RMA operations. @krzikalla This would be a good change. We likely need similar verbiage for all of the collectives. |
@jdinan I meant execution barrier in the abstract sense of synchronizing all processes, not Do you really think we need to explicitly tell users that reductions do not synchronize RMA? If we are going to clarify anything, we should list all of the (few) operations that actually synchronize RMA, rather than note ones that do not. AFAIK, the list of operations that remotely synchronize RMA in some way are |
I would hope that's not necessary. I think the change that @krzikalla suggested, which clarifies that no completion guarantees are made with respect to remote buffers, should cover it. |
Collectives section committee, please review and determine if any clarifications should be added. |
I think @krzikalla's clarifying statement is a very good one. I'd like to propose a few minor edits if that's ok: Accessing symmetric memory involved in a collective routine while the PE is processing that collective results in undefined behavior. Since PEs can enter and exit collectives at different times, accessing such memory remotely may require "symmetric" memory because that's the problem this statement is tackling (right...?). |
Closed by 1.5rc1 |
Hi all,
consider the following (pseudo) code running on two PEs:
Will this always print 2 according to the spec? Or might it print 3 sometimes?
Consider the following scenario: both PEs enter the reduction at nearly the same time. At the start of the reduction processing they send the value of reduction_arg (1) to the respective other PE.
Then, for some reason, PE 0 is delayed. Meanwhile, PE 1 receives the value of PE 0, adds it to its own reduction_arg, stores it to dest and thus can complete the reduction and leave. Afterwards it puts 2 in the reduction_arg of PE 0. This seems to constitute a race, because if PE 0 now resumes its execution, it finds a value of 2 in its own reduction_arg, which it then uses to calculate the result.
Storing the original values would help, but the worker array is too small for this.
Is there something I have missed in the spec?
Thank you for any clarification
Olaf Krzikalla
The text was updated successfully, but these errors were encountered: