-
Notifications
You must be signed in to change notification settings - Fork 913
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
roscpp: ros::service::call hangs (ros ticket #2742) #149
Comments
[isucan] Actually, I'm getting the error once in a while even after the rebuild -- so no MD5 sum errors and no crashed nodes. Also, running with valgrind, I get this: {{{ |
[jfaust] is there some sample code that reproduces this? |
[vlopez] Apologies for reopening such an old ticket, but this issue is still happening on diamondback and cturtle. It has also been mentioned in these other tickets: https://code.ros.org/trac/ros/ticket/2913 https://code.ros.org/trac/ros-pkg/ticket/4195 The issue is caused by a deadlock caused by a service call when pollmanager is I'm working on a code example that reproduces this issue, but it only happens when PollManager is running and I don't know what makes it start working. And there's little information about PollManager. This is the callstack after the deadlock. {{{ {{{ |
[kwc] Please at least provide:
|
[vlopez] I've added the test file. As you can see the issue happens sometimes after the following steps:
I understand that this behaviour is not typical, and you might expect it to fail. That's why I check if the client is valid and if the service exists. The real issue is the deadlock. |
[straszheim] Thanks for the great test case. Should be fixed in r15057, will watch hudson for a bit to see how things go. |
[straszheim] Was a little overly agressive checking for the condition. Real fix should be in r15058. Watching hudson again. |
[straszheim] Fixed! |
[straszheim] in r15289 there may be code that revives this deadlock. |
[gerkey] Indeed, it's been unfixed by r15289. r15295 disables the test for this bug. It should be re-enabled after it's re-fixed. |
[vlopez] Do you have an estimate time of when will these be fixed? Will it make it into Fuerte? This bug is preventing us from using persistent services. And using non-persistent services is giving us many problems since each non-persistent service call uses up to 2 system sockets and we're running out of sockets on heavy service usage situations. System sockets are not completely freed until 1-2minutes have passed since they were closed, and if the system runs out of sockets the load average skyrockets and the system slows down. |
[jrcapriles] Replying to [comment:13 kwc]: I'm working with ROS Electric and I found out the same problem. I've a state machine who has several persistence connections to service servers. When I switch between states sometimes I kill/launch nodes. The problem that I have is if I kill a node which had a service server that had been already called once, when the node starts again and my service client in the state machine call it, it hangs. How is the situation about this ticket? Is there any alternative solution? |
[akimmel] I am also having this issue running ROS Electric on Mac 10.7.4, and I was wondering if there is a workaround for this issue. I am running a few different nodes which each have several persistent services running, and I encounter this same problem when executing the ros::service::call method between the nodes. |
[vlopez] I'm still struggling with this issue. I have found something that might be related to the cause of the problem. But after the service server crashes, the socket is in WAIT status for as long as 2 minutes (depending on the configuration of the computer), until the TCP socket is finally closed, the server disconnection is not noticed and the client is susceptible to be blocked forever when calling. |
[asomerville] related to #2913 |
I'm unable to attach the file that reproduced the issue. For the record it is here: https://code.ros.org/trac/ros/attachment/ticket/2742/main.cpp |
Changes since 1.8.15: 1.9.46 (2013-06-18) ------------------- * add dependency on roslisp (`#240 <https://github.com/ros/ros_comm/issues/240>`_) * fix crash in bag migration (`#239 <https://github.com/ros/ros_comm/issues/239>`_) * add CMake function roslaunch_add_file_check() (`#241 <https://github.com/ros/ros_comm/issues/241>`_) * fix rosnode_ping to check if new node uri is valid before using it (`#235 <https://github.com/ros/ros_comm/issues/235>`_) 1.9.45 (2013-06-06) ------------------- * improve handling of UDP transport, when fragmented packets are lost or arive out-of-order the connection is not dropped anymore, onle a single message is lost (`#226 <https://github.com/ros/ros_comm/issues/226>`_) * fix missing generation of constant definitions for services (`ros/gencpp#2 <https://github.com/ros/gencpp/issues/2>`_) * fix restoring thread context when callback throws an exception (`#219 <https://github.com/ros/ros_comm/issues/219>`_) * fix calling PollManager::shutdown() repeatedly (`#217 <https://github.com/ros/ros_comm/issues/217>`_) * add missing run_depend on python-yaml * allow configuration of ports for XML RPCs and TCP ROS * fix race condition where rospy subscribers do not connect to all publisher * fix closing and deregistering connection when connect fails (`#128 <https://github.com/ros/ros_comm/issues/128>`_) * fix log level of RosOutHandler (`#210 <https://github.com/ros/ros_comm/issues/210>`_) * added option '--duration' to 'rosbag play' (`#121 <https://github.com/ros/ros_comm/issues/121>`_) * fix missing newlines in rosbag error messages (`#237 <https://github.com/ros/ros_comm/issues/237>`_) * fix flushing for tools like 'rosbag compress' (`#237 <https://github.com/ros/ros_comm/issues/237>`_) * add warnings for obviously wrong environment variables ROS_HOSTNAME and ROS_IP (`#134 <https://github.com/ros/ros_comm/issues/134>`_) * fix exception on netifaces.ifaddresses() (`#211 <https://github.com/ros/ros_comm/issues/211>`_, `#213 <https://github.com/ros/ros_comm/issues/213>`_) (regression from 1.9.42) * modify rosnode_ping to check for changed node uri if connection is refused (`#221 <https://github.com/ros/ros_comm/issues/221>`_) * allow passing arguments to add_rostest(ARGS ...) (`#232 <https://github.com/ros/ros_comm/issues/232>`_) * modified roslaunch $(find PKG) to consider path behind it for resolve strategy (`#233 <https://github.com/ros/ros_comm/pull/233>`_) * add boolean attribute 'subst_value' to rosparam tag in launch files (`#218 <https://github.com/ros/ros_comm/issues/218>`_) * add command line parameter to print out launch args * fix missing import in arg_dump.py * fix template syntax for signal_.template addCallback() to work with Intel compiler 1.9.44 (2013-03-21) ------------------- * fix install destination for dll's under Windows * fix various issues on Windows (`#189 <https://github.com/ros/ros_comm/issues/189>`_) * fix 'roslaunch --files' with non-unique anononymous ids (`#186 <https://github.com/ros/ros_comm/issues/186>`_) * fix ROS_MASTER_URI for Windows 1.9.43 (2013-03-13) ------------------- * implement process killer for Windows * fix exports of message filter symbols for Windows 1.9.42 (2013-03-08) ------------------- * improve speed of message generation in dry packages (`#183 <https://github.com/ros/ros_comm/issues/183>`_) * fix roscpp service call deadlock (`#149 <https://github.com/ros/ros_comm/issues/149>`_) * fix freezing service calls when returning false (`#168 <https://github.com/ros/ros_comm/issues/168>`_) * fix error message publishing wrong message type (`#178 <https://github.com/ros/ros_comm/issues/178>`_) * fix missing explicit dependency on pthread (`#135 <https://github.com/ros/ros_comm/issues/135>`_) * fix compiler warning about wrong comparison of message md5 hashes (`#165 <https://github.com/ros/ros_comm/issues/165>`_) * make dependencies on rospy optional by refactoring RosStreamHandler to rosgraph (`#179 <https://github.com/ros/ros_comm/issues/179>`_) * added option '--duration' to 'rosrun rosbag play' (`#121 <https://github.com/ros/ros_comm/issues/121>`_) * add error message to rosbag when using same in and out file (`#171 <https://github.com/ros/ros_comm/issues/171>`_) * fix handling spaces in folder names (`ros/catkin#375 <https://github.com/ros/catkin/issues/375>`_) * replace custom code with Python module netifaces (`#130 <https://github.com/ros/ros_comm/issues/130>`_) * make dependencies on rospy optional by refactoring RosStreamHandler to rosgraph (`#179 <https://github.com/ros/ros_comm/issues/179>`_) * add option --skip-log-check (`#133 <https://github.com/ros/ros_comm/issues/133>`_) * update API doc to list raised exceptions in config.py * fix invocation of Python scripts under Windows (`#54 <https://github.com/ros/ros_comm/issues/54>`_) * fix usage of rosservice from within a launch file * fix missing run_depend on rosbag (`#179 <https://github.com/ros/ros_comm/issues/179>`_) 1.9.41 (2013-01-24) ------------------- * allow sending data exceeding 2GB in chunks (`#4049 <https://code.ros.org/trac/ros/ticket/4049>`_) * update getParam() doc (`#1460 <https://code.ros.org/trac/ros/ticket/1460>`_) * add param::get(float) (`#3754 <https://code.ros.org/trac/ros/ticket/3754>`_) * update inactive assert when publishing message with md5sum *, update related tests (`#3714 <https://code.ros.org/trac/ros/ticket/3714>`_) * fix ros master retry timeout (`#4024 <https://code.ros.org/trac/ros/ticket/4024>`_) * fix inactive assert when publishing message with wrong type (`#3714 <https://code.ros.org/trac/ros/ticket/3714>`_) * improve performance of $(find ...) 1.9.40 (2013-01-13) ------------------- * add colorization for rospy log output (`#3691 <https://code.ros.org/trac/ros/ticket/3691>`_) * fix socket polling under Windows (`#3959 <https://code.ros.org/trac/ros/ticket/3959>`_) * fix bagsort script (`#42 <https://github.com/ros/ros_comm/issues/42>`_) * fix dependent packages by pass LOG4CXX include dirs and libraries along * fix usage of variable arguments in vFormatToBuffer() function * add colorization for rospy log output (`#3691 <https://code.ros.org/trac/ros/ticket/3691>`_) * fix 'roslaunch --pid=' when pointing to ROS_HOME but folder does not exist (`#43 <https://github.com/ros/ros_comm/issues/43>`_) * fix 'roslaunch --pid=' to use shell expansion for the pid value (`#44 <https://github.com/ros/ros_comm/issues/44>`_) * fix output of 'rossrv --help' (`#3979 <https://code.ros.org/trac/ros/ticket/3979>`_) * add support for boolean in 'rostopic -p' (`#3948 <https://code.ros.org/trac/ros/ticket/3948>`_) * add checks for pip packages and rosdep * fix check for catkin_pkg * fix for thread race condition causes incorrect graph connectivity analysis
…roup_icon rqt_reconfigure: change icon for closing node (fix ros#48)
I am using this call method from a ServiceClient:
{{{
template<class MReq, class MRes>
bool call(MReq& req, MRes& res)
}}}
tracing the execution, this goes into service_server_link.cpp
{{{
bool ServiceServerLink::call(const SerializedMessage& req, SerializedMessage& resp)
}}}
the problem seems to be that the finished_condition_.wait() call here never terminates.
{{{
boost::mutex::scoped_lock lock(info->finished_mutex_);
}}}
using:
https://code.ros.org/svn/ros/stacks/ros/tags/latest/core/roscpp
r9871
This happens almost every time.
I believe my setup had incorrect MD5 sums (however, no error or warning mentioned this). At some point, the node providing the service crashed (potentially caused by the MD5 sum error). I kept this setup until I found the hanging call to the condition. Doing rosmake in my build fixed all the problems.
Regardless of the potential issues with the build I used and the service provider node, I believe there should have been warnings about the MD5 sum being different and the code should not hang.
trac data:
The text was updated successfully, but these errors were encountered: