Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roscpp: ros::service::call hangs (ros ticket #2742) #149

Closed
tfoote opened this issue Jan 31, 2013 · 16 comments
Closed

roscpp: ros::service::call hangs (ros ticket #2742) #149

tfoote opened this issue Jan 31, 2013 · 16 comments
Labels

Comments

@tfoote
Copy link
Member

tfoote commented Jan 31, 2013

I am using this call method from a ServiceClient:
{{{
template<class MReq, class MRes>
bool call(MReq& req, MRes& res)
}}}

tracing the execution, this goes into service_server_link.cpp
{{{
bool ServiceServerLink::call(const SerializedMessage& req, SerializedMessage& resp)
}}}

the problem seems to be that the finished_condition_.wait() call here never terminates.

{{{
boost::mutex::scoped_lock lock(info->finished_mutex_);

while (!info->finished_)
{
  info->finished_condition_.wait(lock);
}

}}}

using:
https://code.ros.org/svn/ros/stacks/ros/tags/latest/core/roscpp
r9871

This happens almost every time.
I believe my setup had incorrect MD5 sums (however, no error or warning mentioned this). At some point, the node providing the service crashed (potentially caused by the MD5 sum error). I kept this setup until I found the hanging call to the condition. Doing rosmake in my build fixed all the problems.

Regardless of the potential issues with the build I used and the service provider node, I believe there should have been warnings about the MD5 sum being different and the code should not hang.

trac data:

@tfoote
Copy link
Member Author

tfoote commented Jan 31, 2013

[isucan] Actually, I'm getting the error once in a while even after the rebuild -- so no MD5 sum errors and no crashed nodes.

Also, running with valgrind, I get this:

{{{
==21823== Invalid read of size 8
==21823== at 0x8468CFC: ros::ServiceServerLink::onResponse(boost::shared_ptrros::Connection const&, boost::shared_array const&, unsigned int, bool) (shared_count.hpp:250)
==21823== by 0x83E556F: ros::Connection::readTransport() (function_template.hpp:1013)
==21823== by 0x83E70AD: ros::Connection::onReadable(boost::shared_ptrros::Transport const&) (connection.cpp:104)
==21823== by 0x83F3E67: boost::function1<void, boost::shared_ptrros::Transport const&>::operator()(boost::shared_ptrros::Transport const&) const (function_template.hpp:1013)
==21823== by 0x83F2E69: ros::TransportTCP::socketUpdate(int) (transport_tcp.cpp:555)
==21823== by 0x83DCBBA: ros::PollSet::update(int) (function_template.hpp:1013)
==21823== by 0x841F014: ros::PollManager::threadFunc() (poll_manager.cpp:95)
==21823== by 0x8D37DAF: thread_proxy (in /usr/lib/libboost_thread.so.1.40.0)
==21823== by 0xA418A03: start_thread (pthread_create.c:300)
==21823== by 0xA70DD4C: clone (clone.S:112)
==21823== Address 0x16473518 is not stack'd, malloc'd or (recently) free'd
==21823==
==21823== Invalid write of size 8
==21823== at 0x8468D00: ros::ServiceServerLink::onResponse(boost::shared_ptrros::Connection const&, boost::shared_array const&, unsigned int, bool) (shared_array.hpp:47)
==21823== by 0x83E556F: ros::Connection::readTransport() (function_template.hpp:1013)
==21823== by 0x83E70AD: ros::Connection::onReadable(boost::shared_ptrros::Transport const&) (connection.cpp:104)
==21823== by 0x83F3E67: boost::function1<void, boost::shared_ptrros::Transport const&>::operator()(boost::shared_ptrros::Transport const&) const (function_template.hpp:1013)
==21823== by 0x83F2E69: ros::TransportTCP::socketUpdate(int) (transport_tcp.cpp:555)
==21823== by 0x83DCBBA: ros::PollSet::update(int) (function_template.hpp:1013)
==21823== by 0x841F014: ros::PollManager::threadFunc() (poll_manager.cpp:95)
==21823== by 0x8D37DAF: thread_proxy (in /usr/lib/libboost_thread.so.1.40.0)
==21823== by 0xA418A03: start_thread (pthread_create.c:300)
==21823== by 0xA70DD4C: clone (clone.S:112)
==21823== Address 0x16473510 is not stack'd, malloc'd or (recently) free'd
==21823==
==21823== Invalid read of size 8
==21823== at 0x8468D19: ros::ServiceServerLink::onResponse(boost::shared_ptrros::Connection const&, boost::shared_array const&, unsigned int, bool) (sp_counted_base_gcc_x86.hpp:66)
==21823== by 0x83E556F: ros::Connection::readTransport() (function_template.hpp:1013)
==21823== by 0x83E70AD: ros::Connection::onReadable(boost::shared_ptrros::Transport const&) (connection.cpp:104)
==21823== by 0x83F3E67: boost::function1<void, boost::shared_ptrros::Transport const&>::operator()(boost::shared_ptrros::Transport const&) const (function_template.hpp:1013)
==21823== by 0x83F2E69: ros::TransportTCP::socketUpdate(int) (transport_tcp.cpp:555)
==21823== by 0x83DCBBA: ros::PollSet::update(int) (function_template.hpp:1013)
==21823== by 0x841F014: ros::PollManager::threadFunc() (poll_manager.cpp:95)
==21823== by 0x8D37DAF: thread_proxy (in /usr/lib/libboost_thread.so.1.40.0)
==21823== by 0xA418A03: start_thread (pthread_create.c:300)
==21823== by 0xA70DD4C: clone (clone.S:112)
==21823== Address 0x16473518 is not stack'd, malloc'd or (recently) free'd
==21823==
==21823== Invalid read of size 4
==21823== at 0x45B3F7: boost::detail::atomic_exchange_and_add(int*, int) (sp_counted_base_gcc_x86.hpp:50)
==21823== by 0x45B538: boost::detail::sp_counted_base::release() (sp_counted_base_gcc_x86.hpp:143)
==21823== by 0x8468D26: ros::ServiceServerLink::onResponse(boost::shared_ptrros::Connection const&, boost::shared_array const&, unsigned int, bool) (shared_count.hpp:253)
==21823== by 0x83E556F: ros::Connection::readTransport() (function_template.hpp:1013)
==21823== by 0x83E70AD: ros::Connection::onReadable(boost::shared_ptrros::Transport const&) (connection.cpp:104)
==21823== by 0x83F3E67: boost::function1<void, boost::shared_ptrros::Transport const&>::operator()(boost::shared_ptrros::Transport const&) const (function_template.hpp:1013)
==21823== by 0x83F2E69: ros::TransportTCP::socketUpdate(int) (transport_tcp.cpp:555)
==21823== by 0x83DCBBA: ros::PollSet::update(int) (function_template.hpp:1013)
==21823== by 0x841F014: ros::PollManager::threadFunc() (poll_manager.cpp:95)
==21823== by 0x8D37DAF: thread_proxy (in /usr/lib/libboost_thread.so.1.40.0)
==21823== by 0xA418A03: start_thread (pthread_create.c:300)
==21823== by 0xA70DD4C: clone (clone.S:112)
==21823== Address 0x16473610 is not stack'd, malloc'd or (recently) free'd
==21823==
==21823== Invalid write of size 8
==21823== at 0x8468D27: ros::ServiceServerLink::onResponse(boost::shared_ptrros::Connection const&, boost::shared_array const&, unsigned int, bool) (shared_count.hpp:254)
==21823== by 0x83E556F: ros::Connection::readTransport() (function_template.hpp:1013)
==21823== by 0x83E70AD: ros::Connection::onReadable(boost::shared_ptrros::Transport const&) (connection.cpp:104)
==21823== by 0x83F3E67: boost::function1<void, boost::shared_ptrros::Transport const&>::operator()(boost::shared_ptrros::Transport const&) const (function_template.hpp:1013)
==21823== by 0x83F2E69: ros::TransportTCP::socketUpdate(int) (transport_tcp.cpp:555)
==21823== by 0x83DCBBA: ros::PollSet::update(int) (function_template.hpp:1013)
==21823== by 0x841F014: ros::PollManager::threadFunc() (poll_manager.cpp:95)
==21823== by 0x8D37DAF: thread_proxy (in /usr/lib/libboost_thread.so.1.40.0)
==21823== by 0xA418A03: start_thread (pthread_create.c:300)
==21823== by 0xA70DD4C: clone (clone.S:112)
==21823== Address 0x16473518 is not stack'd, malloc'd or (recently) free'd
==21823==
==21823== Invalid write of size 8
==21823== at 0x8468D30: ros::ServiceServerLink::onResponse(boost::shared_ptrros::Connection const&, boost::shared_array const&, unsigned int, bool) (serialized_message.h:40)
==21823== by 0x83E556F: ros::Connection::readTransport() (function_template.hpp:1013)
==21823== by 0x83E70AD: ros::Connection::onReadable(boost::shared_ptrros::Transport const&) (connection.cpp:104)
==21823== by 0x83F3E67: boost::function1<void, boost::shared_ptrros::Transport const&>::operator()(boost::shared_ptrros::Transport const&) const (function_template.hpp:1013)
==21823== by 0x83F2E69: ros::TransportTCP::socketUpdate(int) (transport_tcp.cpp:555)
==21823== by 0x83DCBBA: ros::PollSet::update(int) (function_template.hpp:1013)
==21823== by 0x841F014: ros::PollManager::threadFunc() (poll_manager.cpp:95)
==21823== by 0x8D37DAF: thread_proxy (in /usr/lib/libboost_thread.so.1.40.0)
==21823== by 0xA418A03: start_thread (pthread_create.c:300)
==21823== by 0xA70DD4C: clone (clone.S:112)
==21823== Address 0x16473520 is not stack'd, malloc'd or (recently) free'd
==21823==
==21823== Invalid write of size 8
==21823== at 0x8468D39: ros::ServiceServerLink::onResponse(boost::shared_ptrros::Connection const&, boost::shared_array const&, unsigned int, bool) (serialized_message.h:40)
==21823== by 0x83E556F: ros::Connection::readTransport() (function_template.hpp:1013)
==21823== by 0x83E70AD: ros::Connection::onReadable(boost::shared_ptrros::Transport const&) (connection.cpp:104)
==21823== by 0x83F3E67: boost::function1<void, boost::shared_ptrros::Transport const&>::operator()(boost::shared_ptrros::Transport const&) const (function_template.hpp:1013)
==21823== by 0x83F2E69: ros::TransportTCP::socketUpdate(int) (transport_tcp.cpp:555)
==21823== by 0x83DCBBA: ros::PollSet::update(int) (function_template.hpp:1013)
==21823== by 0x841F014: ros::PollManager::threadFunc() (poll_manager.cpp:95)
==21823== by 0x8D37DAF: thread_proxy (in /usr/lib/libboost_thread.so.1.40.0)
==21823== by 0xA418A03: start_thread (pthread_create.c:300)
==21823== by 0xA70DD4C: clone (clone.S:112)
==21823== Address 0x16473528 is not stack'd, malloc'd or (recently) free'd
==21823==
==21823== Invalid read of size 8
==21823== at 0x8468D50: ros::ServiceServerLink::onResponse(boost::shared_ptrros::Connection const&, boost::shared_array const&, unsigned int, bool) (shared_count.hpp:262)
==21823== by 0x83E556F: ros::Connection::readTransport() (function_template.hpp:1013)
==21823== by 0x83E70AD: ros::Connection::onReadable(boost::shared_ptrros::Transport const&) (connection.cpp:104)
==21823== by 0x83F3E67: boost::function1<void, boost::shared_ptrros::Transport const&>::operator()(boost::shared_ptrros::Transport const&) const (function_template.hpp:1013)
==21823== by 0x83F2E69: ros::TransportTCP::socketUpdate(int) (transport_tcp.cpp:555)
==21823== by 0x83DCBBA: ros::PollSet::update(int) (function_template.hpp:1013)
==21823== by 0x841F014: ros::PollManager::threadFunc() (poll_manager.cpp:95)
==21823== by 0x8D37DAF: thread_proxy (in /usr/lib/libboost_thread.so.1.40.0)
==21823== by 0xA418A03: start_thread (pthread_create.c:300)
==21823== by 0xA70DD4C: clone (clone.S:112)
==21823== Address 0x16473538 is not stack'd, malloc'd or (recently) free'd
==21823==
==21823== Invalid write of size 8
==21823== at 0x8468D54: ros::ServiceServerLink::onResponse(boost::shared_ptrros::Connection const&, boost::shared_array const&, unsigned int, bool) (move.h:83)
==21823== by 0x83E556F: ros::Connection::readTransport() (function_template.hpp:1013)
==21823== by 0x83E70AD: ros::Connection::onReadable(boost::shared_ptrros::Transport const&) (connection.cpp:104)
==21823== by 0x83F3E67: boost::function1<void, boost::shared_ptrros::Transport const&>::operator()(boost::shared_ptrros::Transport const&) const (function_template.hpp:1013)
==21823== by 0x83F2E69: ros::TransportTCP::socketUpdate(int) (transport_tcp.cpp:555)
==21823== by 0x83DCBBA: ros::PollSet::update(int) (function_template.hpp:1013)
==21823== by 0x841F014: ros::PollManager::threadFunc() (poll_manager.cpp:95)
==21823== by 0x8D37DAF: thread_proxy (in /usr/lib/libboost_thread.so.1.40.0)
==21823== by 0xA418A03: start_thread (pthread_create.c:300)
==21823== by 0xA70DD4C: clone (clone.S:112)
==21823== Address 0x16473530 is not stack'd, malloc'd or (recently) free'd
==21823==
==21823== Invalid write of size 8
==21823== at 0x8468D58: ros::ServiceServerLink::onResponse(boost::shared_ptrros::Connection const&, boost::shared_array const&, unsigned int, bool) (shared_count.hpp:263)
==21823== by 0x83E556F: ros::Connection::readTransport() (function_template.hpp:1013)
==21823== by 0x83E70AD: ros::Connection::onReadable(boost::shared_ptrros::Transport const&) (connection.cpp:104)
==21823== by 0x83F3E67: boost::function1<void, boost::shared_ptrros::Transport const&>::operator()(boost::shared_ptrros::Transport const&) const (function_template.hpp:1013)
==21823== by 0x83F2E69: ros::TransportTCP::socketUpdate(int) (transport_tcp.cpp:555)
==21823== by 0x83DCBBA: ros::PollSet::update(int) (function_template.hpp:1013)
==21823== by 0x841F014: ros::PollManager::threadFunc() (poll_manager.cpp:95)
==21823== by 0x8D37DAF: thread_proxy (in /usr/lib/libboost_thread.so.1.40.0)
==21823== by 0xA418A03: start_thread (pthread_create.c:300)
==21823== by 0xA70DD4C: clone (clone.S:112)
==21823== Address 0x16473538 is not stack'd, malloc'd or (recently) free'd
==21823==
==21823== Invalid write of size 8
==21823== at 0x8468D6B: ros::ServiceServerLink::onResponse(boost::shared_ptrros::Connection const&, boost::shared_array const&, unsigned int, bool) (serialized_message.h:40)
==21823== by 0x83E556F: ros::Connection::readTransport() (function_template.hpp:1013)
==21823== by 0x83E70AD: ros::Connection::onReadable(boost::shared_ptrros::Transport const&) (connection.cpp:104)
==21823== by 0x83F3E67: boost::function1<void, boost::shared_ptrros::Transport const&>::operator()(boost::shared_ptrros::Transport const&) const (function_template.hpp:1013)
==21823== by 0x83F2E69: ros::TransportTCP::socketUpdate(int) (transport_tcp.cpp:555)
==21823== by 0x83DCBBA: ros::PollSet::update(int) (function_template.hpp:1013)
==21823== by 0x841F014: ros::PollManager::threadFunc() (poll_manager.cpp:95)
==21823== by 0x8D37DAF: thread_proxy (in /usr/lib/libboost_thread.so.1.40.0)
==21823== by 0xA418A03: start_thread (pthread_create.c:300)
==21823== by 0xA70DD4C: clone (clone.S:112)
==21823== Address 0x16473540 is not stack'd, malloc'd or (recently) free'd
}}}

@tfoote
Copy link
Member Author

tfoote commented Jan 31, 2013

[jfaust] is there some sample code that reproduces this?

@tfoote
Copy link
Member Author

tfoote commented Jan 31, 2013

[vlopez] Apologies for reopening such an old ticket, but this issue is still happening on diamondback and cturtle.

It has also been mentioned in these other tickets: https://code.ros.org/trac/ros/ticket/2913 https://code.ros.org/trac/ros-pkg/ticket/4195

The issue is caused by a deadlock caused by a service call when pollmanager is

I'm working on a code example that reproduces this issue, but it only happens when PollManager is running and I don't know what makes it start working. And there's little information about PollManager.

This is the callstack after the deadlock.

{{{
#0 0xb78d3430 in __kernel_vsyscall ()
#1 0xb6dbfaf9 in __lll_lock_wait () from /lib/tls/i686/cmov/libpthread.so.0
#2 0xb6dbb149 in L_lock_839 () from /lib/tls/i686/cmov/libpthread.so.0
#3 0xb6dbafdb in pthread_mutex_lock () from /lib/tls/i686/cmov/libpthread.so.0
#4 0xb71a8dba in lock (this=) at /usr/include/boost/thread/pthread/recursive_mutex.hpp:67
#5 lock (this=) at /usr/include/boost/thread/locks.hpp:338
#6 unique_lock (m
=, this=) at /usr/include/boost/thread/locks.hpp:224
#7 ros::TransportTCP::close (this=0xb3200fd4) at /opt/ros/cturtle/ros/core/roscpp/src/libros/transport/transport_tcp.cpp:336
#8 0xb71ad463 in ros::TransportTCP::read (this=0xb3200f90, buffer=0x8732aa8 "\370\003Ƕ\220\364s\b\020", size=5)
at /opt/ros/cturtle/ros/core/roscpp/src/libros/transport/transport_tcp.cpp:406
#9 0xb719cc8e in ros::Connection::readTransport (this=0xb3201598) at /opt/ros/cturtle/ros/core/roscpp/src/libros/connection.cpp:123
#10 0xb719dcad in ros::Connection::read (this=0xb3201598, size=5, callback=...) at /opt/ros/cturtle/ros/core/roscpp/src/libros/connection.cpp:284
#11 0xb7229112 in ros::ServiceServerLink::onRequestWritten (this=0xb3202118, conn=...) at /opt/ros/cturtle/ros/core/roscpp/src/libros/service_server_link.cpp:181
#12 0xb722dd44 in operator() (a1=, p=, this=) at /usr/include/boost/bind/mem_fn_template.hpp:162
#13 operator()<boost::_mfi::mf1<void, ros::ServiceServerLink, const ros::ConnectionPtr&>, boost::_bi::list1<const boost::shared_ptrros::Connection&> > (
a=, f=, this=) at /usr/include/boost/bind/bind.hpp:306
#14 operator()boost::shared_ptr<ros::Connection > (a1=, this=) at /usr/include/boost/bind/bind_template.hpp:47
#15 boost::detail::function::void_function_obj_invoker1<boost::_bi::bind_t<void, boost::_mfi::mf1<void, ros::ServiceServerLink, ros::ConnectionPtr const&>, boost::_bi::list2boost::_bi::value<ros::ServiceServerLink*, boost::arg<1> > >, void, ros::ConnectionPtr const&>::invoke (function_obj_ptr=..., a0=...)
at /usr/include/boost/function/function_template.hpp:153
#16 0xb719febd in operator() (a0=, this=) at /usr/include/boost/function/function_template.hpp:1013
#17 ros::Connection::writeTransport (this=0xb3201598) at /opt/ros/cturtle/ros/core/roscpp/src/libros/connection.cpp:240
#18 0xb71a07e2 in ros::Connection::write (this=0xb3201598, buffer=..., size=4, callback=..., immediate=true)
at /opt/ros/cturtle/ros/core/roscpp/src/libros/connection.cpp:311
#19 0xb722b864 in ros::ServiceServerLink::processNextCall (this=0xb3202118) at /opt/ros/cturtle/ros/core/roscpp/src/libros/service_server_link.cpp:312
#20 0xb722c207 in ros::ServiceServerLink::call (this=0xb3202118, req=..., resp=...) at /opt/ros/cturtle/ros/core/roscpp/src/libros/service_server_link.cpp:348
#21 0xb724d03e in ros::ServiceClient::call (this=0xb3200524, req=..., resp=..., service_md5sum=...)
at /opt/ros/cturtle/ros/core/roscpp/src/libros/service_client.cpp:144
#22 0x0812d876 in ros::ServiceClient::call<std_srvs::EmptyRequest, std_srvs::EmptyResponse> (this=0xb3200524, req=..., resp=..., service_md5sum=...)
at /opt/ros/cturtle/ros/core/roscpp/include/ros/service_client.h:102
#23 0x0812ce09 in ros::ServiceClient::call<std_srvs::Empty> (this=0xb3200524, service=...) at /opt/ros/cturtle/ros/core/roscpp/include/ros/service_client.h:90
}}}

{{{
#0 0xb78d3430 in __kernel_vsyscall ()
#1 0xb6bbd0cc in sched_yield () from /lib/tls/i686/cmov/libc.so.6
#2 0xb76a6f47 in boost::this_thread::yield() () from /usr/lib/libboost_thread.so.1.40.0
#3 0xb722a095 in ros::ServiceServerLink::cancelCall (this=0xb3202118, info=...) at /opt/ros/cturtle/ros/core/roscpp/src/libros/service_server_link.cpp:83
#4 0xb722a246 in ros::ServiceServerLink::clearCalls (this=0xb3202118) at /opt/ros/cturtle/ros/core/roscpp/src/libros/service_server_link.cpp:99
#5 0xb722a5c1 in ros::ServiceServerLink::onConnectionDropped (this=0xb3202118, conn=...) at /opt/ros/cturtle/ros/core/roscpp/src/libros/service_server_link.cpp:173
#6 0xb722dcb4 in operator() (a1=, p=, this=) at /usr/include/boost/bind/mem_fn_template.hpp:162
#7 operator()<boost::_mfi::mf1<void, ros::ServiceServerLink, const ros::ConnectionPtr&>, boost::_bi::list2<const boost::shared_ptrros::Connection&, ros::Connection::DropReason&> > (a=, f=, this=) at /usr/include/boost/bind/bind.hpp:306
#8 operator()boost::shared_ptr<ros::Connection, ros::Connection::DropReason> (a2=, a1=, this=)
at /usr/include/boost/bind/bind_template.hpp:76
#9 boost::detail::function::void_function_obj_invoker2<boost::_bi::bind_t<void, boost::_mfi::mf1<void, ros::ServiceServerLink, ros::ConnectionPtr const&>, boost::_bi::list2boost::_bi::value<ros::ServiceServerLink*, boost::arg<1> > >, void, ros::ConnectionPtr const&, ros::Connection::DropReason>::invoke (function_obj_ptr=...,
a0=..., a1=ros::Connection::TransportDisconnect) at /usr/include/boost/function/function_template.hpp:153
#10 0xb71a548e in boost::signal2<void, boost::shared_ptrros::Connection const&, ros::Connection::DropReason, boost::last_value, int, std::less, boost::function<void (boost::shared_ptrros::Connection const&, ros::Connection::DropReason)> >::operator()(boost::shared_ptrros::Connection const&, ros::Connection::DropReason) () from /opt/ros/cturtle/ros/core/roscpp/lib/libros.so
#11 0xb719e41f in ros::Connection::drop (this=0xb3201598, reason=ros::Connection::TransportDisconnect)
at /opt/ros/cturtle/ros/core/roscpp/src/libros/connection.cpp:332
#12 0xb719ec66 in ros::Connection::onDisconnect (this=0xb3201598, transport=...) at /opt/ros/cturtle/ros/core/roscpp/src/libros/connection.cpp:319
#13 0xb71a2e54 in operator() (a1=, p=, this=) at /usr/include/boost/bind/mem_fn_template.hpp:162
#14 operator()<boost::_mfi::mf1<void, ros::Connection, const ros::TransportPtr&>, boost::_bi::list1<const boost::shared_ptrros::Transport&> > (a=,
f=, this=) at /usr/include/boost/bind/bind.hpp:306
#15 operator()boost::shared_ptr<ros::Transport > (a1=, this=) at /usr/include/boost/bind/bind_template.hpp:47
#16 boost::detail::function::void_function_obj_invoker1<boost::_bi::bind_t<void, boost::_mfi::mf1<void, ros::Connection, ros::TransportPtr const&>, boost::_bi::list2boost::_bi::value<ros::Connection*, boost::arg<1> > >, void, ros::TransportPtr const&>::invoke (function_obj_ptr=..., a0=...)
at /usr/include/boost/function/function_template.hpp:153
#17 0xb71a9223 in operator() (a0=..., this=) at /usr/include/boost/function/function_template.hpp:1013
#18 ros::TransportTCP::close (this=0xb3200f90) at /opt/ros/cturtle/ros/core/roscpp/src/libros/transport/transport_tcp.cpp:373
#19 0xb71abded in ros::TransportTCP::socketUpdate (this=0xb3200f90, events=)
at /opt/ros/cturtle/ros/core/roscpp/src/libros/transport/transport_tcp.cpp:614
#20 0xb71adb24 in operator() (p=, this=, a1=) at /usr/include/boost/bind/mem_fn_template.hpp:162
#21 operator()<boost::_mfi::mf1<void, ros::TransportTCP, int>, boost::_bi::list1<int&> > (a=, f=, this=)
at /usr/include/boost/bind/bind.hpp:306
#22 operator() (a1=, this=) at /usr/include/boost/bind/bind_template.hpp:32
#23 boost::detail::function::void_function_obj_invoker1<boost::_bi::bind_t<void, boost::_mfi::mf1<void, ros::TransportTCP, int>, boost::_bi::list2boost::_bi::value<ros::TransportTCP*, boost::arg<1> > >, void, int>::invoke (function_obj_ptr=..., a0=25) at /usr/include/boost/function/function_template.hpp:153
---Type to continue, or q to quit---
#24 0xb7197adf in operator() (a0=, this=) at /usr/include/boost/function/function_template.hpp:1013
#25 ros::PollSet::update (this=0x872c008, poll_timeout=10) at /opt/ros/cturtle/ros/core/roscpp/src/libros/poll_set.cpp:267
#26 0xb71dd283 in ros::PollManager::threadFunc (this=0x872c008) at /opt/ros/cturtle/ros/core/roscpp/src/libros/poll_manager.cpp:95
#27 0xb71de777 in operator() (p=, this=) at /usr/include/boost/bind/mem_fn_template.hpp:49
#28 operator()<boost::_mfi::mf0<void, ros::PollManager>, boost::_bi::list0> (a=, f=, this=)
at /usr/include/boost/bind/bind.hpp:246
#29 operator() (this=) at /usr/include/boost/bind/bind_template.hpp:20
#30 boost::detail::thread_data<boost::_bi::bind_t<void, boost::_mfi::mf0<void, ros::PollManager>, boost::_bi::list1boost::_bi::value<ros::PollManager* > > >::run (
this=0x872ddf0) at /usr/include/boost/thread/detail/thread.hpp:56
#31 0xb76a77c5 in thread_proxy () from /usr/lib/libboost_thread.so.1.40.0
#32 0xb6db896e in start_thread () from /lib/tls/i686/cmov/libpthread.so.0
#33 0xb6be6a4e in clone () from /lib/tls/i686/cmov/libc.so.6
}}}

@tfoote
Copy link
Member Author

tfoote commented Jan 31, 2013

[kwc] Please at least provide:

  • some set of instructions that leads to this condition
  • an example in Diamondback, not C Turtle, and include the version number (rosversion ros_comm)

@tfoote
Copy link
Member Author

tfoote commented Jan 31, 2013

[vlopez] I've added the test file.

As you can see the issue happens sometimes after the following steps:

  1. Create a service server.
  2. Create a persistent connection to this service and call it
  3. Shutdown the server
  4. Create another service server in the same node, with the same name and to the same function.
  5. Call the service with the persistent client we had created before.

I understand that this behaviour is not typical, and you might expect it to fail. That's why I check if the client is valid and if the service exists.

The real issue is the deadlock.

@tfoote
Copy link
Member Author

tfoote commented Jan 31, 2013

[straszheim] Thanks for the great test case. Should be fixed in r15057, will watch hudson for a bit to see how things go.

@tfoote
Copy link
Member Author

tfoote commented Jan 31, 2013

[straszheim] Was a little overly agressive checking for the condition. Real fix should be in r15058. Watching hudson again.

@tfoote
Copy link
Member Author

tfoote commented Jan 31, 2013

[straszheim] Fixed!

@tfoote
Copy link
Member Author

tfoote commented Jan 31, 2013

[straszheim] in r15289 there may be code that revives this deadlock.

@tfoote
Copy link
Member Author

tfoote commented Jan 31, 2013

[gerkey] Indeed, it's been unfixed by r15289. r15295 disables the test for this bug. It should be re-enabled after it's re-fixed.

@tfoote
Copy link
Member Author

tfoote commented Jan 31, 2013

[vlopez] Do you have an estimate time of when will these be fixed? Will it make it into Fuerte?

This bug is preventing us from using persistent services.

And using non-persistent services is giving us many problems since each non-persistent service call uses up to 2 system sockets and we're running out of sockets on heavy service usage situations.

System sockets are not completely freed until 1-2minutes have passed since they were closed, and if the system runs out of sockets the load average skyrockets and the system slows down.

@tfoote
Copy link
Member Author

tfoote commented Jan 31, 2013

[jrcapriles] Replying to [comment:13 kwc]:

I'm working with ROS Electric and I found out the same problem. I've a state machine who has several persistence connections to service servers. When I switch between states sometimes I kill/launch nodes. The problem that I have is if I kill a node which had a service server that had been already called once, when the node starts again and my service client in the state machine call it, it hangs. How is the situation about this ticket? Is there any alternative solution?

@tfoote
Copy link
Member Author

tfoote commented Jan 31, 2013

[akimmel] I am also having this issue running ROS Electric on Mac 10.7.4, and I was wondering if there is a workaround for this issue. I am running a few different nodes which each have several persistent services running, and I encounter this same problem when executing the ros::service::call method between the nodes.

@tfoote
Copy link
Member Author

tfoote commented Jan 31, 2013

[vlopez] I'm still struggling with this issue.

I have found something that might be related to the cause of the problem.
Services use TCP connection, when the server dies, the client should be notified because the connection has dropped.

But after the service server crashes, the socket is in WAIT status for as long as 2 minutes (depending on the configuration of the computer), until the TCP socket is finally closed, the server disconnection is not noticed and the client is susceptible to be blocked forever when calling.

@tfoote
Copy link
Member Author

tfoote commented Jan 31, 2013

[asomerville] related to #2913

@v-lopez
Copy link
Contributor

v-lopez commented Feb 7, 2013

I'm unable to attach the file that reproduced the issue. For the record it is here: https://code.ros.org/trac/ros/attachment/ticket/2742/main.cpp

PierrickKoch referenced this issue in PierrickKoch/robotpkg Feb 21, 2014
Changes since 1.8.15:

1.9.46 (2013-06-18)
-------------------

* add dependency on roslisp
  (`#240 <https://github.com/ros/ros_comm/issues/240>`_)
* fix crash in bag migration
  (`#239 <https://github.com/ros/ros_comm/issues/239>`_)
* add CMake function roslaunch_add_file_check()
  (`#241 <https://github.com/ros/ros_comm/issues/241>`_)
* fix rosnode_ping to check if new node uri is valid before using it
  (`#235 <https://github.com/ros/ros_comm/issues/235>`_)

1.9.45 (2013-06-06)
-------------------
* improve handling of UDP transport, when fragmented packets are lost or arive
  out-of-order the connection is not dropped anymore, onle a single message is
  lost (`#226 <https://github.com/ros/ros_comm/issues/226>`_)
* fix missing generation of constant definitions for services
  (`ros/gencpp#2 <https://github.com/ros/gencpp/issues/2>`_)
* fix restoring thread context when callback throws an exception
  (`#219 <https://github.com/ros/ros_comm/issues/219>`_)
* fix calling PollManager::shutdown() repeatedly
  (`#217 <https://github.com/ros/ros_comm/issues/217>`_)
* add missing run_depend on python-yaml
* allow configuration of ports for XML RPCs and TCP ROS
* fix race condition where rospy subscribers do not connect to all publisher
* fix closing and deregistering connection when connect fails
  (`#128 <https://github.com/ros/ros_comm/issues/128>`_)
* fix log level of RosOutHandler
  (`#210 <https://github.com/ros/ros_comm/issues/210>`_)
* added option '--duration' to 'rosbag play'
  (`#121 <https://github.com/ros/ros_comm/issues/121>`_)
* fix missing newlines in rosbag error messages
  (`#237 <https://github.com/ros/ros_comm/issues/237>`_)
* fix flushing for tools like 'rosbag compress'
  (`#237 <https://github.com/ros/ros_comm/issues/237>`_)
* add warnings for obviously wrong environment variables ROS_HOSTNAME and
  ROS_IP (`#134 <https://github.com/ros/ros_comm/issues/134>`_)
* fix exception on netifaces.ifaddresses()
  (`#211 <https://github.com/ros/ros_comm/issues/211>`_,
  `#213 <https://github.com/ros/ros_comm/issues/213>`_)
  (regression from 1.9.42)
* modify rosnode_ping to check for changed node uri if connection is refused
  (`#221 <https://github.com/ros/ros_comm/issues/221>`_)
* allow passing arguments to add_rostest(ARGS ...)
 (`#232 <https://github.com/ros/ros_comm/issues/232>`_)
* modified roslaunch $(find PKG) to consider path behind it for resolve
  strategy (`#233 <https://github.com/ros/ros_comm/pull/233>`_)
* add boolean attribute 'subst_value' to rosparam tag in launch files
  (`#218 <https://github.com/ros/ros_comm/issues/218>`_)
* add command line parameter to print out launch args
* fix missing import in arg_dump.py
* fix template syntax for signal_.template addCallback() to work with Intel
  compiler

1.9.44 (2013-03-21)
-------------------
* fix install destination for dll's under Windows
* fix various issues on Windows
  (`#189 <https://github.com/ros/ros_comm/issues/189>`_)
* fix 'roslaunch --files' with non-unique anononymous ids
  (`#186 <https://github.com/ros/ros_comm/issues/186>`_)
* fix ROS_MASTER_URI for Windows

1.9.43 (2013-03-13)
-------------------
* implement process killer for Windows
* fix exports of message filter symbols for Windows

1.9.42 (2013-03-08)
-------------------
* improve speed of message generation in dry packages
  (`#183 <https://github.com/ros/ros_comm/issues/183>`_)
* fix roscpp service call deadlock
  (`#149 <https://github.com/ros/ros_comm/issues/149>`_)
* fix freezing service calls when returning false
  (`#168 <https://github.com/ros/ros_comm/issues/168>`_)
* fix error message publishing wrong message type
  (`#178 <https://github.com/ros/ros_comm/issues/178>`_)
* fix missing explicit dependency on pthread
  (`#135 <https://github.com/ros/ros_comm/issues/135>`_)
* fix compiler warning about wrong comparison of message md5 hashes
  (`#165 <https://github.com/ros/ros_comm/issues/165>`_)
* make dependencies on rospy optional by refactoring RosStreamHandler to
  rosgraph (`#179 <https://github.com/ros/ros_comm/issues/179>`_)
* added option '--duration' to 'rosrun rosbag play'
  (`#121 <https://github.com/ros/ros_comm/issues/121>`_)
* add error message to rosbag when using same in and out file
  (`#171 <https://github.com/ros/ros_comm/issues/171>`_)
* fix handling spaces in folder names
  (`ros/catkin#375 <https://github.com/ros/catkin/issues/375>`_)
* replace custom code with Python module netifaces
  (`#130 <https://github.com/ros/ros_comm/issues/130>`_)
* make dependencies on rospy optional by refactoring RosStreamHandler to
  rosgraph (`#179 <https://github.com/ros/ros_comm/issues/179>`_)
* add option --skip-log-check
  (`#133 <https://github.com/ros/ros_comm/issues/133>`_)
* update API doc to list raised exceptions in config.py
* fix invocation of Python scripts under Windows
  (`#54 <https://github.com/ros/ros_comm/issues/54>`_)
* fix usage of rosservice from within a launch file
* fix missing run_depend on rosbag
  (`#179 <https://github.com/ros/ros_comm/issues/179>`_)

1.9.41 (2013-01-24)
-------------------
* allow sending data exceeding 2GB in chunks
  (`#4049 <https://code.ros.org/trac/ros/ticket/4049>`_)
* update getParam() doc
  (`#1460 <https://code.ros.org/trac/ros/ticket/1460>`_)
* add param::get(float)
  (`#3754 <https://code.ros.org/trac/ros/ticket/3754>`_)
* update inactive assert when publishing message with md5sum *, update related
  tests (`#3714 <https://code.ros.org/trac/ros/ticket/3714>`_)
* fix ros master retry timeout
  (`#4024 <https://code.ros.org/trac/ros/ticket/4024>`_)
* fix inactive assert when publishing message with wrong type
  (`#3714 <https://code.ros.org/trac/ros/ticket/3714>`_)
* improve performance of $(find ...)

1.9.40 (2013-01-13)
-------------------
* add colorization for rospy log output
  (`#3691 <https://code.ros.org/trac/ros/ticket/3691>`_)
* fix socket polling under Windows
  (`#3959 <https://code.ros.org/trac/ros/ticket/3959>`_)
* fix bagsort script (`#42 <https://github.com/ros/ros_comm/issues/42>`_)
* fix dependent packages by pass LOG4CXX include dirs and libraries along
* fix usage of variable arguments in vFormatToBuffer() function
* add colorization for rospy log output
  (`#3691 <https://code.ros.org/trac/ros/ticket/3691>`_)
* fix 'roslaunch --pid=' when pointing to ROS_HOME but folder does not exist
  (`#43 <https://github.com/ros/ros_comm/issues/43>`_)
* fix 'roslaunch --pid=' to use shell expansion for the pid value
  (`#44 <https://github.com/ros/ros_comm/issues/44>`_)
* fix output of 'rossrv --help'
  (`#3979 <https://code.ros.org/trac/ros/ticket/3979>`_)
* add support for boolean in 'rostopic -p'
  (`#3948 <https://code.ros.org/trac/ros/ticket/3948>`_)
* add checks for pip packages and rosdep
* fix check for catkin_pkg
* fix for thread race condition causes incorrect graph connectivity analysis
contradict pushed a commit to contradict/ros_comm that referenced this issue Aug 12, 2016
…roup_icon

rqt_reconfigure: change icon for closing node (fix ros#48)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants