-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix so that MPMD tests run #2452
Conversation
@pnorbert and @JasonRuonanWang . So, nothing is easy. This restores tests that got turned off inadvertently in May. But the world has changed since then, and turning those tests back on introduces failures with MSMPI (so far) in SSC and burst buffer tests. Can you guys please propose fixes? Perhaps just push to this PR? |
I am working on it. This is really weird. I can understand the Fortran part, as I don't have any Fortran tests. But for the C++ part, I don't think the SSC tests are very different from the common staging tests, but they are still failing... |
Generally the FORTRAN bit is testing to see that you do the adios array dimension reordering between c/c++ and FORTRAN. (Odd that my autocorrect insists on capitalizing FORTRAN...)
…Sent from my iPhone
On Sep 4, 2020, at 11:00 AM, Jason Wang <[email protected]> wrote:
I am working on it. This is really weird. I can understand the Fortran part, as I don't have any Fortran tests. But for the C++ part, I don't think the SSC tests are very different from the common staging tests, but they are still failing...
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub<#2452 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AAHF7GM46QPDPJPW7FNRLADSED6IHANCNFSM4QVBK6VA>.
|
Yes, that's what I figured too, so I said I can kind of understand that part. But there is also some non-Fortran stuff failing too, which is probably more problematic... |
@eisenhauer The current fortran reader only reads one step and then calls close directly, whereas the C++ writer writes multiple steps. This cannot work with any MPI based engines as the writer's and reader's behaviors do not match. I have added step status checking in the fortran reader and it solved some of the failed tests. I am still trying to fix others. |
Hmm. Yeah, I probably knew that about the fortran reader. I think that reader prog has been like that since it was created, so the behavior isn't new. Did this somehow work before these tests were disabled? |
I am not sure actually why it worked before. But it shouldn't anyway. In any engine that uses MPI, it will have to issue a collective call (or a matched pair of MPI_send and MPI_recv) across the writers and the readers at every step. So if their behaviors do not match then it won't work by definition. Checking the step status on the reader side is the only way to match them. If somehow it worked before then I would consider it a bug. I understand your point but I have several reasons to have this common fortran reader having step status checking. First, the common C++ reader has step status checking. If the common fortran reader does not have it then the 1x1 test and 1x1.Fortran test will do completely different things, which in my opinion they shouldn't. If there is a need to keep this early stop reader feature, then we probably should create it for both C++ and fortran, for example 1x1.EarlyStopReader and 1x1.Fortran.EarlyStepReader. In this case, different engines can choose what they want explicitly, rather than having this mismatched behaviors across languages. Second, I would consider a reader code that has step status checking the most standard way of using ADIOS2, especially for staging engines where the reader does not have any idea how many steps there is, until BeginStep returns EndOfStream. I would be surprised if many staging codes need to stop the reader at a different point than EndOfStream. |
I agree completely with upgrading the fortran reader. It exists as it does because it was thrown together so we had some check to see if the dimension reordering was happening. It got committed in that way and hasn't been looked at much since. |
An additional "early exit" scenario that might be worth explicitly trying to test for in MPMD mode is clean shutdown in coupled codes, since that will involve both reader and writer functions. It would be good to know that a clean exit is possible in that scenario regardless of ordering of the closes, even if one of the codes went on to do some mop-up file writing, etc. at the end. |
8abe1f9
to
30d7f80
Compare
@eisenhauer I have fixed all SSC related problems. Is anyone working on the other bugs? |
@pnorbert Have you looked at these errors in Utils.IOTest.BurstBuffer.1to1.BP4.Stream.Run? Any suggestions? |
FYI Norbert, the failures are all windows and all the same: |
…ata repetitively, and passed the staging-common tests
@eisenhauer There comes another failed test Engine.Staging.TestThreads.Basic.SST.BP.Serial on Mac. I didn't touch it, but it started failing somehow... |
PR #2223 introduced the CMake variable ADIOS2_RUN_MPMD_TESTS, but with an inconsistent name. This PR changes it consistently to ADIOS2_RUN_MPI_MPMD_TESTS.
Closes issue #2451