Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mutex deadlock on service connection drop. #2057

Closed
guillaumeautran opened this issue Sep 23, 2020 · 19 comments · Fixed by #2074
Closed

Mutex deadlock on service connection drop. #2057

guillaumeautran opened this issue Sep 23, 2020 · 19 comments · Fixed by #2074

Comments

@guillaumeautran
Copy link
Contributor

guillaumeautran commented Sep 23, 2020

ROS Version

noetic-devel
tag: 1.15.8

Thread Stack Trace

Thread 1

#0 __lll_lock_wait (futex=futex@entry=0x7f2f4802d248, private=0) at lowlevellock.c:52 #1 0x00007f2f68a6b131 in __GI___pthread_mutex_lock (mutex=mutex@entry=0x7f2f4802d248) at ../nptl/pthread_mutex_lock.c:115 #2 0x00007f2f68d35600 in boost::posix::pthread_mutex_lock (m=<optimized out>) at /usr/include/boost/thread/pthread/pthread_helpers.hpp:79 #3 boost::recursive_mutex::lock (this=<optimized out>) at /usr/include/boost/thread/pthread/recursive_mutex.hpp:108 #4 boost::unique_lock<boost::recursive_mutex>::lock (this=0x7f2f5dff9ad0) at /usr/include/boost/thread/lock_types.hpp:346 #5 0x00007f2f68b42bf0 in boost::unique_lock<boost::recursive_mutex>::unique_lock (m_=..., this=0x7f2f5dff9ad0) at /usr/include/boost/thread/lock_types.hpp:121 #6 ros::Connection::drop (this=0x7f2f4802d0c0, reason=reason@entry=ros::Connection::Destructing) at ../../../src/roscpp/src/libros/connection.cpp:330 #7 0x00007f2f68b9ac66 in ros::ServiceServerLink::processNextCall (this=0x7f2f4802b2b0) at /usr/include/boost/smart_ptr/shared_ptr.hpp:732 #8 0x00007f2f68b9b56d in ros::ServiceServerLink::callFinished (this=<optimized out>) at ../../../src/roscpp/src/libros/service_server_link.cpp:277 #9 0x00007f2f68b9babd in ros::ServiceServerLink::onResponse (this=0x7f2f4802b2b0, conn=..., buffer=..., size=<optimized out>, success=<optimized out>) at ../../../src/roscpp/src/libros/service_server_link.cpp:250 #10 0x00007f2f68b9bfc0 in ros::ServiceServerLink::onResponseOkAndLength (this=0x7f2f4802b2b0, conn=..., buffer=..., size=<optimized out>, success=<optimized out>) at /usr/include/boost/smart_ptr/detail/shared_count.hpp:122 #11 0x00007f2f68b3f974 in boost::function4<void, boost::shared_ptr<ros::Connection> const&, boost::shared_array<unsigned char> const&, unsigned int, bool>::operator() (a3=true, a2=5, a1=..., a0=..., this=0x7f2f5dff9eb0) at /usr/include/boost/function/function_template.hpp:677 #12 ros::Connection::readTransport (this=0x7f2f4802d0c0) at ../../../src/roscpp/src/libros/connection.cpp:176 #13 0x00007f2f68b40812 in ros::Connection::read(unsigned int, boost::function<void (boost::shared_ptr<ros::Connection> const&, boost::shared_array<unsigned char> const&, unsigned int, bool)> const&) (this=this@entry=0x7f2f4802d0c0, size=size@entry=5, callback=...) at ../../../src/roscpp/src/libros/connection.cpp:286 #14 0x00007f2f68b9a7ce in ros::ServiceServerLink::onRequestWritten (this=0x7f2f4802b2b0, conn=...) at /usr/include/boost/function/function_template.hpp:912 #15 0x00007f2f68b41609 in boost::function1<void, boost::shared_ptr<ros::Connection> const&>::operator() (a0=..., this=0x7f2f5dffa090) at /usr/include/boost/function/function_template.hpp:677 #16 ros::Connection::writeTransport (this=0x7f2f4802d0c0) at ../../../src/roscpp/src/libros/connection.cpp:241 #17 0x00007f2f68b41e4b in ros::Connection::write(boost::shared_array<unsigned char> const&, unsigned int, boost::function<void (boost::shared_ptr<ros::Connection> const&)> const&, bool) (this=this@entry=0x7f2f4802d0c0, buffer=..., size=5, callback=..., immediate=immediate@entry=true) at ../../../src/roscpp/src/libros/connection.cpp:313 #18 0x00007f2f68b9ab92 in ros::ServiceServerLink::processNextCall (this=0x7f2f4802b2b0) at /usr/include/c++/9/new:174 #19 0x00007f2f68b9c88c in ros::ServiceServerLink::call (this=0x7f2f4802b2b0, req=..., resp=...) at ../../../src/roscpp/src/libros/service_server_link.cpp:362 #20 0x00007f2f68ba020e in ros::ServiceClient::call (this=this@entry=0x558278bafcc8, req=..., resp=..., service_md5sum="a24c062c736be593254f72e3ac2e6bc2") at ../../../src/roscpp/src/libros/service_client.cpp:143 #21 0x00007f2f64332b78 in ros::ServiceClient::call<autonomy_msgs::SetLidarRangeRequest_<std::allocator<void> >, autonomy_msgs::SetLidarRangeResponse_<std::allocator<void> > > (service_md5sum="a24c062c736be593254f72e3ac2e6bc2", resp=<synthetic pointer>..., req=<synthetic pointer>..., this=0x558278bafcc8) at /usr/include/boost/smart_ptr/detail/shared_count.hpp:122 ...

Thread 2

#0 0x00007f2f6871189b in sched_yield () at ../sysdeps/unix/syscall-template.S:78 #1 0x00007f2f68b9cf2d in ros::ServiceServerLink::cancelCall (this=<optimized out>, info=...) at ../../../src/roscpp/src/libros/service_server_link.cpp:83 #2 0x00007f2f68b9d0e6 in ros::ServiceServerLink::clearCalls (this=0x7f2f4802b2b0) at ../../../src/roscpp/src/libros/service_server_link.cpp:99 #3 0x00007f2f68b9d466 in ros::ServiceServerLink::onConnectionDropped (this=0x7f2f4802b2b0, conn=...) at ../../../src/roscpp/src/libros/service_server_link.cpp:175 #4 0x00007f2f68b4b962 in boost::function2<void, boost::shared_ptr<ros::Connection> const&, ros::Connection::DropReason>::operator() (a1=<optimized out>, a0=..., this=<optimized out>) at /usr/include/boost/function/function_template.hpp:677 #5 boost::signals2::detail::call_with_tuple_args<boost::signals2::detail::void_type>::m_invoke<boost::function<void (boost::shared_ptr<ros::Connection> const&, ros::Connection::DropReason)>, 0u, 1u, boost::shared_ptr<ros::Connection> const&, ros::Connection::DropReason&>(boost::function<void (boost::shared_ptr<ros::Connection> const&, ros::Connection::DropReason)>&, boost::signals2::detail::unsigned_meta_array<0u, 1u>, std::tuple<boost::shared_ptr<ros::Connection> const&, ros::Connection::DropReason&> const&, boost::enable_if<boost::is_void<boost::function<void (boost::shared_ptr<ros::Connection> const&, ros::Connection::DropReason)>::result_type>, void>::type*) const (this=<optimized out>, args=std::tuple containing = {...}, func=...) at /usr/include/boost/signals2/detail/variadic_slot_invoker.hpp:105 #6 boost::signals2::detail::call_with_tuple_args<boost::signals2::detail::void_type>::operator()<boost::function<void (boost::shared_ptr<ros::Connection> const&, ros::Connection::DropReason)>, boost::shared_ptr<ros::Connection> const&, ros::Connection::DropReason&, 2ul>(boost::function<void (boost::shared_ptr<ros::Connection> const&, ros::Connection::DropReason)>&, std::tuple<boost::shared_ptr<ros::Connection> const&, ros::Connection::DropReason&> const&, mpl_::size_t<2ul>) const (this=<optimized out>, args=std::tuple containing = {...}, func=...) at /usr/include/boost/signals2/detail/variadic_slot_invoker.hpp:90 #7 boost::signals2::detail::variadic_slot_invoker<boost::signals2::detail::void_type, boost::shared_ptr<ros::Connection> const&, ros::Connection::DropReason>::operator()<boost::shared_ptr<boost::signals2::detail::connection_body<std::pair<boost::signals2::detail::slot_meta_group, boost::optional<int> >, boost::signals2::slot<void (boost::shared_ptr<ros::Connection> const&, ros::Connection::DropReason), boost::function<void (boost::shared_ptr<ros::Connection> const&, ros::Connection::DropReason)> >, boost::signals2::mutex> > >(boost::shared_ptr<boost::signals2::detail::connection_body<std::pair<boost::signals2::detail::slot_meta_group, boost::optional<int> >, boost::signals2::slot<void (boost::shared_ptr<ros::Connection> const&, ros::Connection::DropReason), boost::function<void (boost::shared_ptr<ros::Connection> const&, ros::Connection::DropReason)> >, boost::signals2::mutex> > const&) const (connectionBody=..., this=0x7f2f653ee860) at /usr/include/boost/signals2/detail/variadic_slot_invoker.hpp:133 #8 boost::signals2::detail::slot_call_iterator_t<boost::signals2::detail::variadic_slot_invoker<boost::signals2::detail::void_type, boost::shared_ptr<ros::Connection> const&, ros::Connection::DropReason>, std::_List_iterator<boost::shared_ptr<boost::signals2::detail::connection_body<std::pair<boost::signals2::detail::slot_meta_group, boost::optional<int> >, boost::signals2::slot<void (boost::shared_ptr<ros::Connection> const&, ros::Connection::DropReason), boost::function<void (boost::shared_ptr<ros::Connection> const&, ros::Connection::DropReason)> >, boost::signals2::mutex> > >, boost::signals2::detail::connection_body<std::pair<boost::signals2::detail::slot_meta_group, boost::optional<int> >, boost::signals2::slot<void (boost::shared_ptr<ros::Connection> const&, ros::Connection::DropReason), boost::function<void (boost::shared_ptr<ros::Connection> const&, ros::Connection::DropReason)> >, boost::signals2::mutex> >::dereference() const (this=0x7f2f653ee5b0) at /usr/include/boost/signals2/detail/slot_call_iterator.hpp:110 #9 boost::iterators::iterator_core_access::dereference<boost::signals2::detail::slot_call_iterator_t<boost::signals2::detail::variadic_slot_invoker<boost::signals2::detail::void_type, boost::shared_ptr<ros::Connection> const&, ros::Connection::DropReason>, std::_List_iterator<boost::shared_ptr<boost::signals2::detail::connection_body<std::pair<boost::signals2::detail::slot_meta_group, boost::optional<int> >, boost::signals2::slot<void (boost::shared_ptr<ros::Connection> const&, ros::Connection::DropReason), boost::function<void (boost::shared_ptr<ros::Connection> const&, ros::Connection::DropReason)> >, boost::signals2::mutex> > >, boost::signals2::detail::connection_body<std::pair<boost::signals2::detail::slot_meta_group, boost::optional<int> >, boost::signals2::slot<void (boost::shared_ptr<ros::Connection> const&, ros::Connection::DropReason), boost::function<void (boost::shared_ptr<ros::Connection> const&, ros::Connection::DropReason)> >, boost::signals2::mutex> > >(boost::signals2::detail::slot_call_iterator_t<boost::signals2::detail::variadic_slot_invoker<boost::signals2::detail::void_type, boost::shared_ptr<ros::Connection> const&, ros::Connection::DropReason>, std::_List_iterator<boost::shared_ptr<boost::signals2::detail::connection_body<std::pair<boost::signals2::detail::slot_meta_group, boost::optional<int> >, boost::signals2::slot<void (boost::shared_ptr<ros::Connection> const&, ros::Connection::DropReason), boost::function<void (boost::shared_ptr<ros::Connection> const&, ros::Connection::DropReason)> >, boost::signals2::mutex> > >, boost::signals2::detail::connection_body<std::pair<boost::signals2::detail::slot_meta_group, boost::optional<int> >, boost::signals2::slot<void (boost::shared_ptr<ros::Connection> const&, ros::Connection::DropReason), boost::function<void (boost::shared_ptr<ros::Connection> const&, ros::Connection::DropReason)> >, boost::signals2::mutex> > const&) (f=...) at /usr/include/boost/iterator/iterator_facade.hpp:550 #10 boost::iterators::detail::iterator_facade_base<boost::signals2::detail::slot_call_iterator_t<boost::signals2::detail::variadic_slot_invoker<boost::signals2::detail::void_type, boost::shared_ptr<ros::Connection> const&, ros::Connection::DropReason>, std::_List_iterator<boost::shared_ptr<boost::signals2::detail::connection_body<std::pair<boost::signals2::detail::slot_meta_group, boost::optional<int> >, boost::signals2::slot<void (boost::shared_ptr<ros::Connection> const&, ros::Connection::DropReason), boost::function<void (boost::shared_ptr<ros::Connection> const&, ros::Connection::DropReason)> >, boost::signals2::mutex> > >, boost::signals2::detail::connection_body<std::pair<boost::signals2::detail::slot_meta_group, boost::optional<int> >, boost::signals2::slot<void (boost::shared_ptr<ros::Connection> const&, ros::Connection::DropReason), boost::function<void (boost::shared_ptr<ros::Connection> const&, ros::Connection::DropReason)> >, boost::signals2::mutex> >, boost::signals2::detail::void_type, boost::iterators::single_pass_traversal_tag, boost::signals2::detail::void_type const&, long, false, false>::operator*() const (this=0x7f2f653ee5b0) at /usr/include/boost/iterator/iterator_facade.hpp:656 #11 boost::signals2::optional_last_value<void>::operator()<boost::signals2::detail::slot_call_iterator_t<boost::signals2::detail::variadic_slot_invoker<boost::signals2::detail::void_type, boost::shared_ptr<ros::Connection> const&, ros::Connection::DropReason>, std::_List_iterator<boost::shared_ptr<boost::signals2::detail::connection_body<std::pair<boost::signals2::detail::slot_meta_group, boost::optional<int> >, boost::signals2::slot<void (boost::shared_ptr<ros::Connection> const&, ros::Connection::DropReason), boost::function<void (boost::shared_ptr<ros::Connection> const&, ros::Connection::DropReason)> >, boost::signals2::mutex> > >, boost::signals2::detail::connection_body<std::pair<boost::signals2::detail::slot_meta_group, boost::optional<int> >, boost::signals2::slot<void (boost::shared_ptr<ros::Connection> const&, ros::Connection::DropReason), boost::function<void (boost::shared_ptr<ros::Connection> const&, ros::Connection::DropReason)> >, boost::signals2::mutex> > >(boost::signals2::detail::slot_call_iterator_t<boost::signals2::detail::variadic_slot_invoker<boost::signals2::detail::void_type, boost::shared_ptr<ros::Connection> const&, ros::Connection::DropReason>, std::_List_iterator<boost::shared_ptr<boost::signals2::detail::connection_body<std::pair<boost::signals2::detail::slot_meta_group, boost::optional<int> >, boost::signals2::slot<void (boost::shared_ptr<ros::Connection> const&, ros::Connection::DropReason), boost::function<void (boost::shared_ptr<ros::Connection> const&, ros::Connection::DropReason)> >, boost::signals2::mutex> > >, boost::signals2::detail::connection_body<std::pair<boost::signals2::detail::slot_meta_group, boost::optional<int> >, boost::signals2::slot<void (boost::shared_ptr<ros::Connection> const&, ros::Connection::DropReason), boost::function<void (boost::shared_ptr<ros::Connection> const&, ros::Connection::DropReason)> >, boost::signals2::mutex> >, boost::signals2::detail::slot_call_iterator_t<boost::signals2::detail::variadic_slot_invoker<boost::signals2::detail::void_type, boost::shared_ptr<ros::Connection> const&, ros::Connection::DropReason>, std::_List_iterator<boost::shared_ptr<boost::signals2::detail::connection_body<std::pair<boost::signals2::detail::slot_meta_group, boost::optional<int> >, boost::signals2::slot<void (boost::shared_ptr<ros::Connection> const&, ros::Connection::DropReason), boost::function<void (boost::shared_ptr<ros::Connection> const&, ros::Connection::DropReason)> >, boost::signals2::mutex> > >, boost::signals2::detail::connection_body<std::pair<boost::signals2::detail::slot_meta_group, boost::optional<int> >, boost::signals2::slot<void (boost::shared_ptr<ros::Connection> const&, ros::Connection::DropReason), boost::function<void (boost::shared_ptr<ros::Connection> const&, ros::Connection::DropReason)> >, boost::signals2::mutex> >) const (last=..., first=..., this=<optimized out>) at /usr/include/boost/signals2/optional_last_value.hpp:57 #12 boost::signals2::detail::combiner_invoker<void>::operator()<boost::signals2::optional_last_value<void>, boost::signals2::detail::slot_call_iterator_t<boost::signals2::detail::variadic_slot_invoker<boost::signals2::detail::void_type, boost::shared_ptr<ros::Connection> const&, ros::Connection::DropReason>, std::_List_iterator<boost::shared_ptr<boost::signals2::detail::connection_body<std::pair<boost::signals2::detail::slot_meta_group, boost::optional<int> >, boost::signals2::slot<void (boost::shared_ptr<ros::Connection> const&, ros::Connection::DropReason), boost::function<void (boost::shared_ptr<ros::Connection> const&, ros::Connection::DropReason)> >, boost::signals2::mutex> > >, boost::signals2::detail::connection_body<std::pair<boost::signals2::detail::slot_meta_group, boost::optional<int> >, boost::signals2::slot<void (boost::shared_ptr<ros::Connection> const&, ros::Connection::DropReason), boost::function<void (boost::shared_ptr<ros::Connection> const&, ros::Connection::DropReason)> >, boost::signals2::mutex> > >(boost::signals2::optional_last_value<void>&, boost::signals2::detail::slot_call_iterator_t<boost::signals2::detail::variadic_slot_invoker<boost::signals2::detail::void_type, boost::shared_ptr<ros::Connection> const&, ros::Connection::DropReason>, std::_List_iterator<boost::shared_ptr<boost::signals2::detail::connection_body<std::pair<boost::signals2::detail::slot_meta_group, boost::optional<int> >, boost::signals2::slot<void (boost::shared_ptr<ros::Connection> const&, ros::Connection::DropReason), boost::function<void (boost::shared_ptr<ros::Connection> const&, ros::Connection::DropReason)> >, boost::signals2::mutex> > >, boost::signals2::detail::connection_body<std::pair<boost::signals2::detail::slot_meta_group, boost::optional<int> >, boost::signals2::slot<void (boost::shared_ptr<ros::Connection> const&, ros::Connection::DropReason), boost::function<void (boost::shared_ptr<ros::Connection> const&, ros::Connection::DropReason)> >, boost::signals2::mutex> >, boost::signals2::detail::slot_call_iterator_t<boost::signals2::detail::variadic_slot_invoker<boost::signals2::detail::void_type, boost::shared_ptr<ros::Connection> const&, ros::Connection::DropReason>, std::_List_iterator<boost::shared_ptr<boost::signals2::detail::connection_body<std::pair<boost::signals2::detail::slot_meta_group, boost::optional<int> >, boost::signals2::slot<void (boost::shared_ptr<ros::Connection> const&, ros::Connection::DropReason), boost::function<void (boost::shared_ptr<ros::Connection> const&, ros::Connection::DropReason)> >, boost::signals2::mutex> > >, boost::signals2::detail::connection_body<std::pair<boost::signals2::detail::slot_meta_group, boost::optional<int> >, boost::signals2::slot<void (boost::shared_ptr<ros::Connection> const&, ros::Connection::DropReason), boost::function<void (boost::shared_ptr<ros::Connection> const&, ros::Connection::DropReason)> >, boost::signals2::mutex> >) const (this=<optimized out>, last=..., first=..., combiner=...) at /usr/include/boost/signals2/detail/result_type_wrapper.hpp:64 #13 boost::signals2::detail::signal_impl<void (boost::shared_ptr<ros::Connection> const&, ros::Connection::DropReason), boost::signals2::optional_last_value<void>, int, std::less<int>, boost::function<void (boost::shared_ptr<ros::Connection> const&, ros::Connection::DropReason)>, boost::function<void (boost::signals2::connection const&, boost::shared_ptr<ros::Connection> const&, ros::Connection::DropReason)>, boost::signals2::mutex>::operator()(boost::shared_ptr<ros::Connection> const&, ros::Connection::DropReason) (this=0x7f2f48020480, args#0=..., args#1=<optimized out>, args#1@entry=ros::Connection::TransportDisconnect) at /usr/include/boost/signals2/detail/signal_template.hpp:243 #14 0x00007f2f68b42c53 in boost::signals2::signal<void (boost::shared_ptr<ros::Connection> const&, ros::Connection::DropReason), boost::signals2::optional_last_value<void>, int, std::less<int>, boost::function<void (boost::shared_ptr<ros::Connection> const&, ros::Connection::DropReason)>, boost::function<void (boost::signals2::connection const&, boost::shared_ptr<ros::Connection> const&, ros::Connection::DropReason)>, boost::signals2::mutex>::operator()(boost::shared_ptr<ros::Connection> const&, ros::Connection::DropReason) (args#1=ros::Connection::TransportDisconnect, args#0=..., this=0x7f2f4802d230) at /usr/include/boost/smart_ptr/shared_ptr.hpp:726 #15 ros::Connection::drop (this=0x7f2f4802d0c0, reason=ros::Connection::TransportDisconnect) at ../../../src/roscpp/src/libros/connection.cpp:343 #16 0x00007f2f68b43251 in ros::Connection::onDisconnect (this=0x7f2f4802d0c0, transport=...) at ../../../src/roscpp/src/libros/connection.cpp:322 #17 0x00007f2f68bbcde4 in boost::function1<void, boost::shared_ptr<ros::Transport> const&>::operator() (a0=..., this=0x7f2f653eea30) at /usr/include/boost/function/function_template.hpp:677 #18 ros::TransportTCP::close (this=0x7f2f48022110) at ../../../src/roscpp/src/libros/transport/transport_tcp.cpp:481 #19 0x00007f2f68bbd709 in ros::TransportTCP::socketUpdate (this=0x7f2f48022110, events=8197) at ../../../src/roscpp/src/libros/transport/transport_tcp.cpp:758 #20 0x00007f2f68bf722c in boost::function1<void, int>::operator() (a0=<optimized out>, this=0x7f2f653eec50) at /usr/include/boost/function/function_template.hpp:677 #21 ros::PollSet::update (this=this@entry=0x55827833cce0, poll_timeout=poll_timeout@entry=100) at ../../../src/roscpp/src/libros/poll_set.cpp:255 #22 0x00007f2f68b7e2c3 in ros::PollManager::threadFunc (this=0x55827833cce0) at ../../../src/roscpp/src/libros/poll_manager.cpp:88 #23 0x00007f2f6837c43b in ?? () from /lib/x86_64-linux-gnu/libboost_thread.so.1.71.0 #24 0x00007f2f68a68609 in start_thread (arg=<optimized out>) at pthread_create.c:477 #25 0x00007f2f6872e293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Code Analysis

The following commit 5f8d76d introduced a lock prior to calling the signal when a rosservice socket drops: https://github.com/ros/ros_comm/blob/noetic-devel/clients/roscpp/src/libros/connection.cpp#L342
When a socket drops, the signaling thread eventually makes its way to cancelling the call that is in progress here: https://github.com/ros/ros_comm/blob/noetic-devel/clients/roscpp/src/libros/service_server_link.cpp#L81
If at the exact same time, another thread attempts to drop the socket itself via the processNextCall here: https://github.com/ros/ros_comm/blob/noetic-devel/clients/roscpp/src/libros/service_server_link.cpp#L309
The second thread will block forever.

@dirk-thomas
Copy link
Member

@Barry-Xu-2018 Can you please take a look at this bug report since you contributed the referenced code.

@fujitatomoya
Copy link
Contributor

@dirk-thomas we will look into it.

CC: @Barry-Xu-2018 @iuhilnehc-ynos

@Barry-Xu-2018
Copy link
Contributor

@dirk-thomas @fujitatomoya

We will start to investigate this.

@iuhilnehc-ynos
Copy link
Contributor

iuhilnehc-ynos commented Sep 24, 2020

@Barry-Xu-2018 @fujitatomoya

Please ignore my previous comment which seems very bad, so I delete it.

I found there is a mutex drop_mutex_ to control two variables 'dropped_' and 'drop_signal_', maybe we could consider the following source code by using two mutexes that making source code a little more clear.

diff --git a/clients/roscpp/include/ros/connection.h b/clients/roscpp/include/ros/connection.h
index fbef78cb2..980d21411 100644
--- a/clients/roscpp/include/ros/connection.h
+++ b/clients/roscpp/include/ros/connection.h
@@ -210,6 +210,8 @@ private:
   bool is_server_;
   /// Have we dropped?
   bool dropped_;
+  /// Synchronizes dropped_
+  boost::recursive_mutex dropped_mutex_;
   /// Incoming header
   Header header_;
   /// Transport associated with us
diff --git a/clients/roscpp/src/libros/connection.cpp b/clients/roscpp/src/libros/connection.cpp
index c8c271a8b..aa38865de 100644
--- a/clients/roscpp/src/libros/connection.cpp
+++ b/clients/roscpp/src/libros/connection.cpp
@@ -325,23 +325,20 @@ void Connection::onDisconnect(const TransportPtr& transport)
 void Connection::drop(DropReason reason)
 {
   ROSCPP_LOG_DEBUG("Connection::drop(%u)", reason);
-  bool did_drop = false;
   {
-    boost::recursive_mutex::scoped_lock lock(drop_mutex_);
+    boost::recursive_mutex::scoped_lock lock(dropped_mutex_);
     if (!dropped_)
     {
       dropped_ = true;
-      did_drop = true;
+    } else {
+      return;
     }
   }
 
-  if (did_drop)
+  transport_->close();
   {
-    transport_->close();
-    {
-      boost::recursive_mutex::scoped_lock lock(drop_mutex_);
-      drop_signal_(shared_from_this(), reason);
-    }
+    boost::recursive_mutex::scoped_lock lock(drop_mutex_);
+    drop_signal_(shared_from_this(), reason);
   }
 }

What do you think about that?

@Barry-Xu-2018
Copy link
Contributor

@iuhilnehc-ynos

Agree with 2 mutex lock.
How about @fujitatomoya ?

@fujitatomoya
Copy link
Contributor

@Barry-Xu-2018 @iuhilnehc-ynos

i will look into it soon. will get back to you.

@fujitatomoya
Copy link
Contributor

@iuhilnehc-ynos

I see, dropped_mutex_ makes sense and fixes this deadlock.

BTW, ServiceServerLink::cancelCall's been in yield since status(call_finished_) does not changed as expected. service client(non-persist) finished requesting, since the client is non-persistent it will drop the connection to server with requiring drop_mutex_ before changing status. as far as i see, client should drop the connection after changing the status.

@iuhilnehc-ynos
Copy link
Contributor

BTW, ServiceServerLink::cancelCall's been in yield since status(call_finished_) does not changed as expected. service client(non-persist) finished requesting, since the client is non-persistent it will drop the connection to server with requiring drop_mutex_ before changing status.

For these, got the same understanding as you.

as far as i see, client should drop the connection after changing the status.

There exist two cases that client drop the connection,

conn->drop(Connection::Destructing);

connection_->drop(Connection::Destructing);

If we update with client should drop the connection after changing the status, it will totally update the logic of ServiceServerLink::call and processNextCall.
Anyway, this could be an alternative solution.

@fujitatomoya
Copy link
Contributor

If we update with client should drop the connection after changing the status, it will totally update the logic of ServiceServerLink::call and processNextCall

yes, agree. but maybe this is different topic. I do not think this is our choice right now.

@Barry-Xu-2018
Copy link
Contributor

@dirk-thomas

Could you help to review #2061 for this issue ?

@iuhilnehc-ynos
Copy link
Contributor

I have closed the previous PR because of breaking ABI, maybe we could use this kind of way

+        if (current_call_) {
+          current_call_->call_finished_ = true;
+        }
         connection_->drop(Connection::Destructing);      

to fix this issue at

conn->drop(Connection::Destructing);
and
connection_->drop(Connection::Destructing);
.

@Barry-Xu-2018
Copy link
Contributor

@iuhilnehc-ynos

About line 309, I have a question.

if (current_call_)
{
return;
}
if (!call_queue_.empty())
{
ROS_DEBUG_NAMED("superdebug", "[%s] Client to service [%s] processing next service call", persistent_ ? "persistent" : "non-persistent", service_name_.c_str());
current_call_ = call_queue_.front();
call_queue_.pop();
}
else
{
empty = true;
}
}
if (empty)
{
if (!persistent_)
{
ROS_DEBUG_NAMED("superdebug", "Dropping non-persistent client to service [%s]", service_name_.c_str());
connection_->drop(Connection::Destructing);

According to above codes, it seems that current_call_ should always be "nullptr" at 309.

@iuhilnehc-ynos
Copy link
Contributor

According to above codes, it seems that current_call_ should always be "nullptr" at 309.

Yes, I think you're right.

@guillaumeautran
Copy link
Contributor Author

Just a suggestion here but would it be sufficient to just remove the lock statement here: https://github.com/ros/ros_comm/blob/noetic-devel/clients/roscpp/src/libros/connection.cpp#L342 ? What is that lock protecting?

@dirk-thomas
Copy link
Member

Any update on a mergable patch for this?

@fujitatomoya
Copy link
Contributor

@dirk-thomas

sorry, most of our members are on holiday, we will be back in next week.

@iuhilnehc-ynos
Copy link
Contributor

Just a suggestion here but would it be sufficient to just remove the lock statement here: https://github.com/ros/ros_comm/blob/noetic-devel/clients/roscpp/src/libros/connection.cpp#L342 ? What is that lock protecting?

Thanks for your suggestion. But it's for #1939.

drop_signal_(shared_from_this(), reason) -> TransportSubscriberLink::onConnectionDropped

parent->removeSubscriberLink(shared_from_this());

this might throw a bad_weak exception while calling shared_from_this() if TransportSubscriberLink already destroyed

@iuhilnehc-ynos
Copy link
Contributor

@dirk-thomas

Any update on a mergable patch for this?

Sorry about not fixing it till now.

@iuhilnehc-ynos
Copy link
Contributor

@dirk-thomas

Could you please review this PR, #2074?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants