-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Building PIO with IntelMPI #1613
Comments
After building PIO did you run make test? |
There is a bug in intel mpi that they have fixed in impi/19.0.6 |
Thanks. Would a bug in IntelMPI explain why I'm getting a stack trace that includes calls to MPICH though? |
All dependent libraries (netcdf, hdf5, pnetcdf) need also to be built with the same MPI. Perhaps you are using one that was created with mpich? |
I ran make test and got: $ make test 0% tests passed, 103 tests failed out of 103 Total Test time (real) = 3.72 sec The following tests FAILED: |
I've built all the dependent libraries you mention myself against IntelMPI. Certainly that has been the intent. I'm not even sure how MPICH could be on the system at this point. |
I suspect the tests are faiiing because you are running on a login node and not one that has access to mpi. Try getting an interactive login and running ctest again. I don't see any reference to mpich in your traceback. |
You're right, I was running on a login node but I logged in to a worker and re-ran the build and test there and got the same results. The traceback in my OP shows a reference to (for example) src/mpid/ch3/channels/nemesis/src/ch3_progress.c which I can see is an MPICH source file. |
Hi @jedwards4b thanks for the advice. I'm still not seeing a make test which looks happy. Do you see any issues with the cmake output here? Thanks for any advice you can offer. [root@inst-gjk5z build]# cmake -DCMAKE_INSTALL_PREFIX=$PREFIX -DNetCDF_C_PATH=$PREFIX -DNetCDF_Fortran_PATH=$PREFIX -DPnetCDF_PATH=$PREFIX -DPIO_ENABLE_TIMING=OFF .. You are in 'detached HEAD' state. You can look around, make experimental If you want to create a new branch to retain commits you create, you may git checkout -b new_branch_name HEAD is now at acb24bb... new trunk tag 2% tests passed, 101 tests failed out of 103 Total Test time (real) = 4.29 sec The following tests FAILED: |
This is usually due to a problem with the mpirun command. |
Thanks all for your help with this! I was actually running make tests before make install. This was the cause of my issue. I poured over the docs and realised this eventually. Most other dependencies seem to allow this the other way around. |
The cmake build does not require make install before make tests (or make test). |
Seems like this issue has been resolved, so I will close it. |
This is less of an issue with PIO and more of a request for assistance.
I have built PIO with Intel Parallel Studio 2017
export PREFIX=/home/user/sw
export MPI_VERSION_NUM=2017.1
export MPI_VERSION=impi-$MPI_VERSION_NUM
export CC=mpiicc FC=mpiifort CXX=mpiicpc
export MPICC=mpiicc MPIF77=mpiifort MPIF90=mpiifort
export I_MPI_CC=icc I_MPI_FC=ifort I_MPI_CXX=icpc
export CFLAGS="-O3 -ipo -no-prec-div -fp-model fast=2 -xHost -fPIE -fPIC -I$PREFIX/include"
export CXXFLAGS="-O3 -ipo -no-prec-div -fp-model fast=2 -xHost -fPIE -fPIC -I$PREFIX/include"
export FFLAGS="-O3 -ipo -no-prec-div -fp-model fast=2 -xHost -fPIE -fPIC -I$PREFIX/include"
export FCFLAGS="-O3 -ipo -no-prec-div -fp-model fast=2 -xHost -fPIE -fPIC -I$PREFIX/include"
export LDFLAGS="-fPIC -L$PREFIX/lib -Wl,-rpath -Wl,$PREFIX/lib"
export PATH=/opt/intel/compilers_and_libraries_2017.1.132/linux/bin/intel64:/opt/intel/compilers_and_libraries_2017.1.132/linux/mpi/intel64/bin:$PATH
export LD_LIBRARY_PATH=/opt/intel/compilers_and_libraries_2017.1.132/linux/compiler/lib/intel64_lin
cmake -DCMAKE_INSTALL_PREFIX=$PREFIX -DPIO_FILESYSTEM_HINTS=gpfs -DPIO_ENABLE_TIMING=OFF ..
make && make install
When I run my application (MPAS Atmosphere) it quickly blocks and I can see the processes are all having issues here. What is confusing me is why PIO seems to be blocked against MPICH. I'm not even sure how MPICH has got on to the system!
Thread 2 (Thread 0x2aca7ece3700 (LWP 69651)):
#0 0x00002aca675d8bed in poll () from /lib64/libc.so.6
#1 0x00002aca7fa4b5b3 in cm_thread () from /lib64/libdaploucm.so.2
#2 0x00002aca7fa36a9a in dapli_thread_init () from /lib64/libdaploucm.so.2
#3 0x00002aca66fcee65 in start_thread () from /lib64/libpthread.so.0
#4 0x00002aca675e388d in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x2aca6439db40 (LWP 69612)):
#0 PMPIDI_CH3I_Progress (progress_state=0x600b, is_blocking=30795504) at ../../src/mpid/ch3/channels/nemesis/src/ch3_progress.c:546
#1 0x00002aca664320c3 in MPIR_Waitall_impl (count=24587, array_of_requests=0x1d5e6f0, array_of_statuses=0x0) at ../../src/mpi/pt2pt/waitall.c:217
#2 0x00002aca661ad610 in MPIC_Waitall (numreq=24587, requests=0x1d5e6f0, statuses=0x0, errflag=0x600b) at ../../src/mpi/coll/helper_fns.c:905
#3 0x00002aca65fe804d in PMPI_Alltoallw (sendbuf=0x600b, sendcounts=0x1d5e6f0, sdispls=0x0, sendtypes=0x600b, recvbuf=0xc000, recvcounts=0x1de5600, rdispls=0x7ffd58113000, recvtypes=0x7ffd58112800, comm=-2080374780) at ../../src/mpi/coll/alltoallw.c:172
#4 0x0000000000494255 in pio_swapm () at /home/user/pio-2.4.4/src/clib/pio_spmd.c:128
#5 0x00000000004a1a78 in rearrange_comp2io () at /home/user/pio-2.4.4/src/clib/pio_rearrange.c:968
#6 0x000000000049d04d in PIOc_write_darray_multi () at /home/user/pio-2.4.4/src/clib/pio_darray.c:311
#7 0x000000000049c53b in PIOc_sync () at /home/user/pio-2.4.4/src/clib/pio_darray_int.c:1952
#8 0x0000000000bd9dad in mpas_io_streams_mp_mpas_writestream_ ()
#9 0x0000000000b2c6ea in mpas_stream_manager_mp_write_stream_ ()
#10 0x0000000000b2c0fc in mpas_stream_manager_mp_mpas_stream_mgr_write_ ()
#11 0x00000000005c26ff in atm_core_mp_atm_core_run_ ()
#12 0x0000000000411829 in mpas_subdriver_mp_mpas_run_ ()
#13 0x00000000004117b5 in MAIN__ ()
#14 0x000000000041175e in main ()
Can anyone shed light on this?
The text was updated successfully, but these errors were encountered: