Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mpi::reduce hangs on intel MPI #124

Open
francescopt opened this issue Oct 9, 2020 · 0 comments
Open

mpi::reduce hangs on intel MPI #124

francescopt opened this issue Oct 9, 2020 · 0 comments

Comments

@francescopt
Copy link

The following program hangs on calling boost::mpi::reduce when run on an Intel MPI environment.

#include <algorithm>
#include <iostream>
#include <vector>
#include <boost/mpi/collectives.hpp>
#include <boost/mpi/operations.hpp>
#include <boost/serialization/vector.hpp>

struct sum_vec_vec {
  std::vector<double> operator()(const std::vector<double>& a, const std::vector<double>&b) const
  {
    std::vector<double> res(a.size());
    std::transform(a.begin(), a.end(), b.begin(), res.begin(), [](double x, double y) { return x + y; });
    return res;
  }
};

namespace boost {
  namespace mpi {
    template <>
    struct is_commutative<sum_vec_vec, std::vector<double>> : mpl::true_ { };
  }
}

int main()
{
  namespace mpi = boost::mpi;

  mpi::environment env;
  mpi::communicator world;

  std::size_t size = 1000;
  std::size_t L = 32;

  std::vector<std::vector<double>> correlations;

  // Fill up with some data
  for (std::size_t i = 0; i < size; ++i)
    {
      int l = 0;

      std::vector<double> corr(L);
      for (auto&x : corr)
        x = l++;
      correlations.emplace_back(std::move(corr));
    }

  std::vector<std::vector<double>> av_correlations(correlations.size());
  std::cout << "Ready for mpi::reduce" << std::endl;
  boost::mpi::reduce(world, &correlations.front(), correlations.size(), &av_correlations.front(), sum_vec_vec{}, 0);

  return 0;
}

The specific of the MPI environment are:

MPI_Get_library_version: Intel(R) MPI Library 2019 Update 6 for Linux* OS
MPI_VERSION: 3
I_MPI_NUMVERSION: 20190006300

Boost version is 1.74.0, and it defines:

BOOST_MPI_VERSION: 3
BOOST_MPI_USE_IMPROBE: 1

The program above very often hangs when run for example with >6 tasks, and always hangs when, say, Ntasks=192.

The reason apparently lies in the use of the MPI_Mprobe routines in point_to_point.cpp.
On recompiling the library with the flag BOOST_MPI_USE_IMPROBE disabled in config.hpp, the program ends without issues. Also, the program runs without problems, on openmpi and enabled BOOST_MPI_USE_IMPROBE.

All in all, I suspect that the bug mentioned here is still there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant