-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ADIOS2's MPI communication questions #3741
Comments
Hi,
The short answer is that io.Open() duplicates the MPI communicator passed
by the user (in Open or in adios2::ADIOS(). Moreover, we have our own Comm
class and the engines don't have direct access to MPI and therefore using
MPI_COMM_WORLD wouldn't even compile.
But we mostly use 0 for tags. And your description indicates something is
going on. Still I would first check if you have additional MPI calls in
your code that are only used when ADIOS is used.
Does your code run fine with the NULL engine?
…On Sat, Aug 5, 2023, 6:53 PM Liang Wang ***@***.***> wrote:
Dear ADIOS2 devs/community, the following questions are mostly regarding a
bug in our user code that uses BP5; they may or may not be related to
ADIOS2, but I hope you may kindly offer some thoughts.
First, does ADIOS2 use MPI_COMM_WORLD somewhere during BP5 IO or it
always sticks to its own communicator? Does it use the number zero as a tag
in many places?
Recently, in our code that uses ADIOS2 BP5 as the output file format, we
encountered a bug strange bug.
When we disable output or use hdf5 for output, then no problem is found.
When we use BP5, then we have to change our MPI_Irecv/MPI_Isend's tag in
one in one particular part of our user code to a nonzero number, otherwise,
we get somewhat random failures in some MPI calls (we have tested mostly
with very large numbers).
Though it sounds extremely unlikely, I still would like to get any of your
thoughts on possible conflicts between users' MPI calls and the internal
communication within ADIOS2/BP5.
Thank you very much.
—
Reply to this email directly, view it on GitHub
<#3741>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAYYYLKUFSIPO7OKTSN7LDLXT3FGJANCNFSM6AAAAAA3FPO3NQ>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Hi @pnorbert , thank you for your prompt reply. I will do more investigations following your suggestions.
Also, for a quick check, is there a way to change a "master" value for the tags? |
@pnorbert using NULL engine, the run seems fine |
Interesting. Even though I don't understand why the tag 0 should matter when using different communicators, you may try to change BP5 tags in the Isend/Recv pairs in
Change the 0 tag in |
Thank you, @pnorbert. I did some tests following your suggestion. Changing the tags in the various variants of the bp5 writer didn't seem to help. However, the problem seems to be gone after I change the tag in the PS: I noticed that ADIOS2's communicator calls have additional "hints". Is there a way to print them? |
@pnorbert I think this is caused by a bug in |
Thank you for investigating. So then you don't have this problem with 2.9
(or 2.9.1) anymore?
…On Sun, Aug 6, 2023, 3:15 PM Liang Wang ***@***.***> wrote:
@pnorbert <https://github.com/pnorbert> I think this is caused by a bug
in release_28 that is now fixed! In release_28 of the HankShakeLinks_Start,
rank 0 had the incorrect origin for Recv.
—
Reply to this email directly, view it on GitHub
<#3741 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAYYYLPZWMQDEPKBSQV6SY3XT7UOBANCNFSM6AAAAAA3FPO3NQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
We have been using v2.8 since some preliminary tests with v2.9 failed. But we do plan to turn to using v2.9 when we have time to investigate the causes. Thank you very much for the useful suggestions! I'm going to close this issue now. |
Dear ADIOS2 devs/community, the following questions are mostly regarding a bug in our user code that uses BP5; they may or may not be related to ADIOS2, but I hope you may kindly offer some thoughts.
First, does ADIOS2 use
MPI_COMM_WORLD
somewhere during BP5 IO or it always sticks to its own communicator? Does it use the number zero as a tag in many places?Recently, in our code that uses ADIOS2 BP5 as the output file format, we encountered a bug strange bug.
MPI_Irecv/MPI_Isend
's tag in one in one particular part of our user code to a nonzero number, otherwise, we get somewhat random failures in some MPI calls (we have tested mostly with very large numbers or "strange" numbers as the tag; we have also checked that the user codes'Isend/Irecv
pairs and data size look OK).Though it sounds extremely unlikely, I still would like to get any of your thoughts on possible conflicts between users' MPI calls and the internal communication within ADIOS2/BP5.
Thank you very much.
The text was updated successfully, but these errors were encountered: