-
Notifications
You must be signed in to change notification settings - Fork 871
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MPI_Comm_create_from_group works incorrectly when used with non-overlapping groups but same stringtag argument #10895
Comments
I'm marking this as a blocker because its solution will likely involve changes to a parameter in the mpi.h.in header file - which is incorrect in any case for the maximum allowed stringtag length. |
So you are interpreting that statement as only requiring that all processes calling the function must pass the same stringtag, but not that the stringtag must be unique - yes? That strikes me as an oversight in the standard that needs to be addressed as it frankly makes no sense. Why "tag" a group if it isn't unique? What possible purpose could the tag serve in that case? |
The stringtag is not about identifying the group, because otherwise the MPI standard would be standardizing how external software would have to name their groups. Instead, the stringtag is an identifier for the operation on this particular group. To understands why this is necessary one must look at the API of call, and notice that this API does not have a base communicator, which means there is no dedicated communication channel for this operation, aka. that the only way the processes can order concurrent calls to this function is by uniquely identifying them with the tuple group and tag. Without taking both in account you will either prevent concurrent calls to overlapping groups, or unordered calls with the same group. |
This goes back to the above figure a process sets. We have a blurb in the MPI-4 standard, chapter 11, that states
So the MPI://SELF process set is an example of the former in the above statement, so I could see how a programmer may interpret that to mean that there's no reason for the stringtag to be unique if the group handle argument to the MPI_Comm_create_from_group came from non-overlapping process sets. Any rate, I plan to fix this by appending the vpid of the first proc in the group to the supplied stringtag before turning over to PMIx_Group_construct. Thank goodness we retained the verbiage about restrictions on the group argument concerning order of processes in the group. |
What would have been the order of processes in the new communicator otherwise ? |
I see - thanks for clarifying!
I grok what you are saying, but that still doesn't result in a unique PMIx group name since the proc might be involved in multiple |
Couple of responses/questions, but I'll post them separately.
I have to agree with this question. If a collection of procs calls this function using arbitrary ordering, then it isn't clear to me what the resulting communicator "rank" should be for the participants. If we follow typical MPI logic, it would be the position of the proc in the provided array - but that would differ across the collection. So it would seem that the ordering must be the same if you want to result in a usable MPI communicator, yes? |
@hppritcha I wonder if we are getting caught in a confusion caused by overloading of the "group" term. Are you equating a given MPI communicator with a specific PMIx group? If so, that has some implications that might not be acceptable to the MPI community. Biggest difference is that we have unique string "names" for a group that helps us distinguish between them, even if they have the same membership. You lack that in your communicators and I'm not sure how you'd add it.
If this refers or attempts to correlate with PMIx process sets, then that wouldn't be correct. Because of the dynamic process set support, a process (in PMIx parlance) doesn't belong to just one process set. So two processes could find themselves members of the same process set, but that set could be completely orthogonal to any other set to which they belong. Again, it may be that the terminology unfortunately overlaps and causes confusion. I'm only commenting out of concern regarding what PMIx can and cannot support, wanting to ensure that we do support whatever you are attempting to do. |
@rhc54 getting side tracked today I'll answer questions here tomorrow. we really are only using PMIx groups as a rendezvous mechanism not as something for tagging comms, groups, etc. |
Gotcha - will look forward to your answers when you have time. FWIW, I attempted to clear up some terminology confusion here (#10886 (comment)) that might be relevant in this topic as well. |
@rhc54 the only connection that the MPI communicator generated from MPI_Comm_create_from_group has to a PMIx group is the PMIX_GROUP_CONTEXT_ID we got back from the PMIx_Group_construct call. We use the PMIx group constructor/destructor much like in the pmix example - group_lcl_cid.c, so by the time we return from the call, the pmix group has been destructed. I think as long as we're using PMIx groups in this way we shouldn't have a problem. Note we do have an advice to users that reads thus:
We may want to beef this up some for MPI 4.1. |
Ah, okay - that should work then so long as you don't have two concurrent calls that include the same vpid in the first position as that would result in conflicting group IDs. Still noodling over possible solutions - or is that an illegal scenario under MPI? I wonder if OMPI couldn't just generate a unique tag internally, maybe using some thread-safe internal counter that you integrate into the tag - e.g., "create-from-group-N" for the nth call? |
That approach may be the way to go. we do that already for the mpi_intercomm_create_from_groups method. |
got clarification from @rhc54 about potential max length of grp string arg to PMIx_Group_construct. seems to be a mistake in some older versions of the PMIx standard. removing the blocker status here as there will be no need to update mpi.h.in. |
@hppritcha Sorry, I missed this issue, I was traveling. BTW, my GH username is @dalcinl, with final "L" (as in Lima) , not "I" (as in India). Let me know once you have PR to try out your fix. |
@hppritcha Been thinking about it some more and the indexing approach to the stringtag passed to What I would suggest is that you use the procs themselves as the tag. For example, suppose that you have ranks 0,1,6 from nspace
I only added the colons for clarity - no reason to include them. Remember, the tag doesn't have to be parsable - it just has to be unique. I put a symbol identifying the MPI operation to separate this from the same procs participating in some other MPI operation. Note that this also assumes the ordering of procs in the call is consistent across the procs calling the function - but I think MPI is going to require that anyone as per the above discussion. It also assumes that the same procs cannot engage in multiple concurrent calls to the same MPI operation - not sure if that is correct as perhaps different threads from within the same procs could execute it? |
Looks like there may be at least one place in ompi where being able to compute a sha of a user supplied string plus internally generated data could be useful, so add function to do this to the opal util toolbox. Related to open-mpi#10895 This code is derived from a sha256 implementation at https://github.com/B-Con/crypto-algorithms.git Signed-off-by: Howard Pritchard <[email protected]>
Looks like there may be at least one place in ompi where being able to compute a sha of a user supplied string plus internally generated data could be useful, so add function to do this to the opal util toolbox. Related to open-mpi#10895 This code is derived from a sha256 implementation at https://github.com/B-Con/crypto-algorithms.git Signed-off-by: Howard Pritchard <[email protected]>
Looks like there may be at least one place in ompi where being able to compute a sha of a user supplied string plus internally generated data could be useful, so add function to do this to the opal util toolbox. Related to open-mpi#10895 This code is derived from a sha256 implementation at https://github.com/B-Con/crypto-algorithms.git Signed-off-by: Howard Pritchard <[email protected]> (cherry picked from commit 14d7ca6)
Looks like there may be at least one place in ompi where being able to compute a sha of a user supplied string plus internally generated data could be useful, so add function to do this to the opal util toolbox. Related to open-mpi#10895 This code is derived from a sha256 implementation at https://github.com/B-Con/crypto-algorithms.git Signed-off-by: Howard Pritchard <[email protected]>
to avoid potential race conditions between successive calls to MPI_Comm_create_from_group and MPI_Intercomm_create_from_groups when using the same tag argument value. The PMIx group constructor grp string argument has different semantics from the tag requirements for these MPI constructors, so use discriminators to avoid potential race conditions using PMIx group ops. Related to open-mpi#10895 Signed-off-by: Howard Pritchard <[email protected]>
to avoid potential race conditions between successive calls to MPI_Comm_create_from_group and MPI_Intercomm_create_from_groups when using the same tag argument value. The PMIx group constructor grp string argument has different semantics from the tag requirements for these MPI constructors, so use discriminators to avoid potential race conditions using PMIx group ops. Related to open-mpi#10895 Signed-off-by: Howard Pritchard <[email protected]>
to avoid potential race conditions between successive calls to MPI_Comm_create_from_group and MPI_Intercomm_create_from_groups when using the same tag argument value. The PMIx group constructor grp string argument has different semantics from the tag requirements for these MPI constructors, so use discriminators to avoid potential race conditions when using PMIx group ops. Related to open-mpi#10895 Signed-off-by: Howard Pritchard <[email protected]>
to avoid potential race conditions between successive calls to MPI_Comm_create_from_group and MPI_Intercomm_create_from_groups when using the same tag argument value. The PMIx group constructor grp string argument has different semantics from the tag requirements for these MPI constructors, so use discriminators to avoid potential race conditions when using PMIx group ops. Related to open-mpi#10895 Signed-off-by: Howard Pritchard <[email protected]> (cherry picked from commit 46ff698)
closed via #12960 |
Turns out the way MPI_Comm_create_from_group is using the PMIx_Group_construct method doesn't correctly handle the case where multiple MPI processes are invoking MPI_Comm_create_group using different, non-overlapping MPI groups, but the same stringtag argument.
The wording in the MPI-4 standard does not indicate this is incorrect:
@dalcini
The text was updated successfully, but these errors were encountered: