-
-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mvapich2 test 31 (coarray_navier_stokes) failed #312
Comments
We have decide to disable And delete this test due to a lack of portability. It relies on a binary FFT library that was written in assembly language. It's safe to ignore the failure. Thanks for reporting it. I'll remove the rest from our test suite shortly. |
Hi @LaHaine, Thanks for reporting this. I'm tempted to just dismiss this test failure because the NS tests use some pre-compiled (or maybe written in assembly?) FFT libraries that usually are the cause of all sorts of issues, as @rouson noted. (See, for example, #297.) However, due to the specific nature of the error, it appears to be unrelated to the FFT libraries upon first inspection. Further research shows that mvapich2-2.2 is based on MPICH 3.1.4 which tests fine. This error appears to be an assertion in mvapich having to do with RDMA and MPI windows... I'm wondering if it would be worthwhile for someone like @afanfa who has a deep expertise both of the library internals and in MPI3 to take a quick look at this. Also, it's too bad that this doesn't generate a backtrace, that would be instrumental in localizing the source of this issue, if it is indeed legitimate, in the OpenCoarrays library. |
@rouson: I am in the process of disabling the test. I want to keep a recipe to build it, but remove it from the "all" target so that you have to ask to build it, and then also remove it from automatically being run in the tests. |
@zbeekman: That would be best. BTW it also crashes for me with openmpi 1.10.4:
|
@LaHaine The OpenMPI error is much more helpful! Would it be possible to rebuild OpenCoarrays adding following cmake flag: Also if you know how to I think I need to run all the tests through valgrind --leakcheck and valgrind --helcheck |
Oh, again, forgot the additional mca parameter:
|
I have a strong hunch that this is due to either a) the gfortran runtime's library having problems with the random number intrinsics and thread safety or less likely b) calls like those to |
On an orthogonal note, @afanfa and I are experiencing the exact opposite problem of what @zbeekman reported earlier: we are getting different PRN sequences even when we pass the same seed in serial code. We observe this behavior with gfortran 7.0.0 build dated 20170108 and with a more recent 7.0.1 build, but we get the expected behavior (same sequence) with gfortran 6.3.0. There seems to have been some problems introduced into the gfortran random number generator last year. I'm attempting to isolate the issue and report the bug to the gfortran developers. |
We figured out our issue. I don't know if it affects the case discussed in this thread, but the behavior of random_seed changed between gfortran 6.3.0 and 7.1.0 and, on a related note, Fortran 2015 introduces a new random_init() function that I expect will be very useful for both reproducibility and thread safety so I recommend reading about it in the draft Fortran 2015 standard. |
@LaHaine We're going to close this issue, since we're a bit perplexed by it, and this test has some odd assembly code in it. We've removed the test and think that the issue may lie in mvapich or in some compiler intrinsics as discussed above. There is an MVAPICH mailing listserv that you could try emailing for more information about the failed assertion: http://mvapich.cse.ohio-state.edu/mailinglists/. If you hear anything insightful that indicates an error in OpenCoarrays, please let us know and we can reopen the issue. Right now there is no easy way to localize the problem or even reproduce and test it. Thanks |
This is on CentOS 7.3. I have managed to build and test opencoarrays successfully using gcc 6.1.0 from devtoolset-6 and the included mpich. I have then switched to mvapich2-2.2 compiled using the same gcc and now one test is failing:
This might also be a bug in mvapich2.
The text was updated successfully, but these errors were encountered: