Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cancelling request fails for serialized communication #110

Open
sdebionne opened this issue Nov 19, 2019 · 10 comments
Open

Cancelling request fails for serialized communication #110

sdebionne opened this issue Nov 19, 2019 · 10 comments

Comments

@sdebionne
Copy link

I believe this is a regression from 1.70.

In the following code, an asynchronous recv is initiated then later an other thread marks the operation for cancellation (and that should guarantee wait() to return according to the spec).

This works for fundamental type (such as int) but not for serialized types.

#include <mpi.h>

#include <iostream>
#include <future>

#include <boost/mpi.hpp>
#include <boost/serialization/serialization.hpp>

using namespace std::literals::chrono_literals;
namespace mpi = boost::mpi;

void async_cancel(boost::mpi::request request)
{
  std::this_thread::sleep_for(1s);

  std::cout << "Before MPI_Cancel" << std::endl;

  request.cancel();

  std::cout << "After MPI_Cancel" << std::endl;
}


struct data
{
  int i;
};

template <typename Archive>
void serialize(Archive& ar, data& t, const unsigned int version)
{
  ar & t.i;
}

int main(int argc, char* argv[])
{
  mpi::environment env(mpi::threading::level::multiple);
  mpi::communicator world;
  
  if (world.rank() == 0)
  {
    //int buffer; // WORKS
    data buffer;  // FAILS
    auto request = world.irecv(0, 0, buffer);
    
    auto res = std::async(std::launch::async, &async_cancel, request);

    std::cout << "Before MPI_Wait" << std::endl;

    auto status = request.wait();

    std::cout << "After MPI_Wait " << std::endl;
  }
  else
    std::this_thread::sleep_for(2s);

  return 0;
}

The expected result is:

Before MPI_Wait
Before MPI_Cancel
After MPI_Cancel
After MPI_Wait
@sdebionne
Copy link
Author

checking the source in detail/request_handlers.hpp:

class request::probe_handler
  ...
  void cancel() { m_source = MPI_PROC_NULL; }

does not call MPI_Cancel. So it sounds like it is not supported yet.

Before the refactoring (1.70), I had the following patch:

diff --git a/boost/mpi/communicator.hpp b/boost/mpi/communicator.hpp
index 6e55b16..5bcd985 100644
--- a/boost/mpi/communicator.hpp
+++ b/boost/mpi/communicator.hpp
@@ -1760,6 +1760,8 @@ request::handle_serialized_irecv(request* self, request_action action)
       // Wait for the count message to complete
       BOOST_MPI_CHECK_RESULT(MPI_Wait,
                              (self->m_requests, &stat.m_status));
+      if (stat.cancelled())
+        return stat;
       // Resize our buffer and get ready to receive its data
       data->ia.resize(data->count);
       BOOST_MPI_CHECK_RESULT(MPI_Irecv,
@@ -1772,6 +1774,8 @@ request::handle_serialized_irecv(request* self, request_action action)
     BOOST_MPI_CHECK_RESULT(MPI_Wait,
                            (self->m_requests + 1, &stat.m_status));
 
+    if (stat.cancelled())
+      return stat;
     data->deserialize(stat);
     return stat;
   } else if (action == ra_test) {
@@ -1783,6 +1787,9 @@ request::handle_serialized_irecv(request* self, request_action action)
       BOOST_MPI_CHECK_RESULT(MPI_Test,
                              (self->m_requests, &flag, &stat.m_status));
       if (flag) {
+        if (stat.cancelled())
+          return stat;
+
         // Resize our buffer and get ready to receive its data
         data->ia.resize(data->count);
         BOOST_MPI_CHECK_RESULT(MPI_Irecv,
@@ -1797,11 +1804,22 @@ request::handle_serialized_irecv(request* self, request_action action)
     BOOST_MPI_CHECK_RESULT(MPI_Test,
                            (self->m_requests + 1, &flag, &stat.m_status));
     if (flag) {
+      if (stat.cancelled())
+        return stat;
       data->deserialize(stat);
       return stat;
     } else 
       return optional<status>();
-  } else {
+  }
+  else if (action == ra_cancel) {
+    status stat;
+    if (self->m_requests[0] != MPI_REQUEST_NULL)
+      BOOST_MPI_CHECK_RESULT(MPI_Cancel, (&self->m_requests[0]));
+    if (self->m_requests[1] != MPI_REQUEST_NULL)
+      BOOST_MPI_CHECK_RESULT(MPI_Cancel, (&self->m_requests[0]));
+    return optional<status>();
+  }
+  else {
     return optional<status>();
   }
 }

Something similar might be required.

@sdebionne
Copy link
Author

Is there a way to cancel a blocking MPI_Mprobe call?

@aminiussi
Copy link
Member

I guess not, probably need to move to MPI_IMprobe.

But I need o read 3.8 more carefully (I hate that section...)

@sdebionne
Copy link
Author

sdebionne commented Nov 20, 2019

FYI, I asked the MPI community on SO.

@aminiussi
Copy link
Member

I'm going to check why it hang.

I'd like to point out that none of Boost.MPI function is supposed to map the MPI standard regarding serialized communication as there is none in C MPI. We just try to comply to our documentation which is expected not to be surprising for an MPI user.

Right now it's hanging on:

[alainm@castor test]$ mpirun -np 2 ../../../bin.v2/libs/mpi/test/cancel110-2.test/intel-linux/debug/threading-multi/visibility-hidden/cancel110-2
Before MPI_Wait
Before MPI_Cancel
After MPI_Cancel
 ....

I need to check if a recv with no matching send is defined.

We probably need to change that probe for it's async version and check if they were introduced at the same time.

Thank for you report and test case.

@aminiussi
Copy link
Member

There was a proposal in #72 to make the switch between the 2 implementation available at runtime, we still have that option if it comes down to that.

@sdebionne
Copy link
Author

For my use case, switching at compile time would be good enough. That means that using Probe should be driven not only by the MPI supported standard but also by a user defined macro BOOST_MPI_USE_IMPROBE.

// If BOOST_MPI_USE_IMPROBE was not user defined
#if !defined(BOOST_MPI_USE_IMPROBE)
#if BOOST_MPI_VERSION >= 3 
// MPI_Probe an friends should work
#  if defined(I_MPI_NUMVERSION)
// Excepted for some Intel versions.
// Note that I_MPI_NUMVERSION is not always defined with Intel.
#    if I_MPI_NUMVERSION > 20190004000
#      define BOOST_MPI_HAVE_IMPROBE 1
#    endif
#  else
#    define BOOST_MPI_HAVE_IMPROBE 1
#  endif
#endif
#endif //!defined(BOOST_MPI_USE_IMPROBE)

and use #if BOOST_MPI_USE_IMPROBE instead of #if defined(BOOST_MPI_USE_IMPROBE) in the communicators.

@aminiussi
Copy link
Member

Would a
``
// Uncomment if you don't want to use MPI_Probe and friends`
//#define BOOST_MPI_NO_IMPROBE

Do the trick ?

@aminiussi
Copy link
Member

So basically, what you want is a wait those behavior is:

 while (request.active()) request.test();

?

@aminiussi
Copy link
Member

So it should be possible to implement something sensible. The trick is to handle the status;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants