Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#725 enable test with mpi oversubscribe #938

Merged
merged 2 commits into from
Oct 17, 2018

Conversation

williamfgc
Copy link
Contributor

@williamfgc williamfgc commented Oct 17, 2018

OMPI_MCA_rmaps_base_oversubscribe=yes must be either passed to cmake or be already an
env variable when running cmake and ctest

OMPI_MCA_rmaps_base_oversubscribe=yes must be passed to cmake or be an
env variable
Copy link
Contributor

@chuckatkins chuckatkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is specific to openmpi. You'd have no reason to set that env var for another implementation, i.e. mpich, IntelMPI, sgi mpt, mvapich, cray mpich, etc.

@williamfgc
Copy link
Contributor Author

williamfgc commented Oct 17, 2018

@chuckatkins no openmpi specific env variable is set. I am only looking for it if the user sets it explicitly as an environment variable. Please take another look. Also, this is only done if ncores < 4. HeatTransfer tests are hardcoded to work with 4 nprocs only. The previous master would fail as-is without the proper check. Other implementations would just run with 2 nprocs as that's what the system has. See #725

@chuckatkins
Copy link
Contributor

The entire logic block here can be reworked to be much simpler I think. The use case is to oversubscribe when < 4 processors are detected and the OpenMPI env var explicitly says it's okay, right? FindMPI already sets MPIEXEC_MAX_NUMPROCS to the detected number of physical cores. Given that, the whole logic block can be reduced to:

if(ADIOS2_HAVE_MPI)
  if(MPIEXEC_MAX_NUMPROCS LESS 4 AND "$ENV{OMPI_MCA_rmaps_base_oversubscribe}")
    message(STATUS "OpenMPI oversubscribe detected: raising MPIEXEC_MAC_NUMPROCS to 4 for testing")
    set(MPIEXEC_MAX_NUM_PROCS 4 CACHE STRING "" FORCE)
  endif()
  ...
endif()

I think that's right. Would that work?

@williamfgc
Copy link
Contributor Author

williamfgc commented Oct 17, 2018

@chuckatkins cmake had issues with MPIEXEC_MAX_NUMPROCS https://gitlab.kitware.com/cmake/cmake/merge_requests/1405 which were reproduced on my Mac using different cmake versions. That's why I have to use something more backwards compatible like cmake_host_system_information

@chuckatkins
Copy link
Contributor

We include a private copy of the upstream FindMPI module that should already be patched for this:

cmake_host_system_information(RESULT _MPIEXEC_NUMPROCS QUERY NUMBER_OF_PHYSICAL_CORES)
set(MPIEXEC_MAX_NUMPROCS "${_MPIEXEC_NUMPROCS}" CACHE STRING "Maximum number of processors available to run MPI applications.")

so the cmake_host_system_information workaround in the main CMakeLists.txt shouldn't be necessary anymore. Does it still not work correctly for you?

@williamfgc
Copy link
Contributor Author

@chuckatkins yes, that works using cmake 3.12 on my Mac (we were forced to upgrade).

@williamfgc williamfgc merged commit 6eb9f0f into ornladios:master Oct 17, 2018
@williamfgc williamfgc deleted the oversubscribe branch October 17, 2018 15:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants