Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Controller crashes due to preempt request #301

Closed
JamesAndrews3 opened this issue Oct 26, 2017 · 15 comments
Closed

Controller crashes due to preempt request #301

JamesAndrews3 opened this issue Oct 26, 2017 · 15 comments

Comments

@JamesAndrews3
Copy link

JamesAndrews3 commented Oct 26, 2017

Hello there,

I find an intermittent error when controlling a robot hand. The controller crashes and I can no longer control the hand.

Here is the ROS launch terminal error:

[ INFO] [1508861516.717055713]: Stopping execution due to preempt request
[ INFO] [1508861516.717101655]: Cancelling execution for H1_trajectory_controller
[ INFO] [1508861516.717143522]: Stopped trajectory execution.
[robot_hw-7] process has died [pid 342, exit code -11, cmd ethercat_grant /home/user/projects/shadow_robot/base_deps/devel/lib/ros_control_robot/ros_control_robot __name:=robot_hw __log:=/home/user/.ros/log/7c234788-b8d5-11e7-8bcd-1c1b0d6eaf82/robot_hw-7.log].
log file: /home/user/.ros/log/7c234788-b8d5-11e7-8bcd-1c1b0d6eaf82/robot_hw-7*.log

If you look here you can see action feedback. Basically, I think it is crashing because when a new goal arrives and the previous one hasn't finished yet. Hence it is “preempted”.

There is a check for the pointer to be valid here. As a thread calls it, it may be null as it is occupied by another thread.

I wonder what what be a decent fix for this. Perhaps some mutual exclusion is required?

gdb output trace:
bt
#0  0x00007ffff556f428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
#1  0x00007ffff557102a in __GI_abort () at abort.c:89
#2  0x00007ffff5567bd7 in __assert_fail_base (fmt=<optimized out>,
   assertion=assertion@entry=0x7fffe44b95c4 "px != 0",
   file=file@entry=0x7fffe44b9598 "/usr/include/boost/smart_ptr/shared_ptr.hpp", line=line@entry=648,
   function=function@entry=0x7fffe44c0e80 <boost::shared_ptr<realtime_tools::RealtimeServerGoalHandle<control_msgs::FollowJointTrajectoryAction_<std::allocator<void> > > >::operator->() const::__PRETTY_FUNCTION__> "typename boost::detail::sp_member_access<T>::type boost::shared_ptr<T>::operator->() const [with T = realtime_tools::RealtimeServerGoalHandle<control_msgs::FollowJointTrajectoryAction_<std::allocator<"...) at assert.c:92
#3  0x00007ffff5567c82 in __GI___assert_fail (assertion=0x7fffe44b95c4 "px != 0",
   file=0x7fffe44b9598 "/usr/include/boost/smart_ptr/shared_ptr.hpp", line=648,
   function=0x7fffe44c0e80 <boost::shared_ptr<realtime_tools::RealtimeServerGoalHandle<control_msgs::FollowJointTrajectoryAction_<std::allocator<void> > > >::operator->() const::__PRETTY_FUNCTION__> "typename boost::detail::sp_member_access<T>::type boost::shared_ptr<T>::operator->() const [with T = realtime_tools::RealtimeServerGoalHandle<control_msgs::FollowJointTrajectoryAction_<std::allocator<"...) at assert.c:101
#4  0x00007fffe4445667 in boost::shared_ptr<realtime_tools::RealtimeServerGoalHandle<control_msgs::FollowJointTrajectoryAction_<std::allocator<void> > > >::operator-> (this=0x70bda0)
   at /usr/include/boost/smart_ptr/shared_ptr.hpp:648
#5  0x00007fffe4436869 in joint_trajectory_controller::JointTrajectoryController<trajectory_interface::QuinticSplineSegment<double>, hardware_interface::EffortJointInterface>::update (this=0x70bcc0, time=..., period=...)
   at /home/user/projects/shadow_robot/base_deps/src/ros_controllers/joint_trajectory_controller/include/joint_trajectory_controller/joint_trajectory_controller_impl.h:481
#6  0x00007ffff7760f67 in controller_manager::ControllerManager::update(ros::Time const&, ros::Duration const&, bool) () from /opt/ros/kinetic/lib/libcontroller_manager.so
#7  0x0000000000488f54 in controlLoop ()
   at /home/user/projects/shadow_robot/base_deps/src/ros_control_robot/src/main.cpp:323
#8  0x00007ffff66e76ba in start_thread (arg=0x7fffee34e700) at pthread_create.c:333
#9  0x00007ffff56413dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
(gdb)
@bmagyar
Copy link
Member

bmagyar commented Oct 27, 2017

Hi,

Preemption is part of the joint_trajectory_controller design, we even have several tests that ensure it works correctly both with actions and topics.
Have a look at those tests here and try to reproduce your issue with a test. Maybe it has to do with the RobotHW implementation of Shadow?

Any ideas on this @beatrizleon @toliver ?

@mathias-luedtke
Copy link
Contributor

Have you upgraded your system recently?
You might face an ABI break..
Did you try to clean your workspace?

@toliver
Copy link

toliver commented Oct 30, 2017

@ipa-mdl this happens with the latest binaries, but also with a compilation from source, so it's not likely to be an ABI break.

@bmagyar Good point about the tests, we will try to add a test that reproduces the failure.

One point that probably wasn't clear in the first comment by @JamesAndrews3 is that the crash doesn't happen every time you send a goal that preempts a previous goal, but if you keep sending goals it ends up crashing after a random amount of time. This seems to point to a race condition and I think that it's due to the fact that the access to the rt_active_goal_ object is not under any mutual exclusion clause here and here and these two lines are called from different threads.

@mathias-luedtke
Copy link
Contributor

One point that probably wasn't clear in the first comment by @JamesAndrews3 is that the crash doesn't happen every time you send a goal that preempts a previous goal, but if you keep sending goals it ends up crashing after a random amount of time.

👍 for the clarification!

This seems to point to a race condition and I think that it's due to the fact that the access to the rt_active_goal_ object is not under any mutual exclusion clause

Good point! The code for preemption handling needs some serious refactoring. (#48, #174).
ros-controls/realtime_tools#22 might come in handy!
Anyway, the handling of data between the different threads needs to be protected or at least coordinated!

As a quick fix current_active_goal should be used to fill the feedback, probably in the else branch.

@mathias-luedtke
Copy link
Contributor

Actually @JamesAndrews3 pointed us to the same lines, but for some reason I have not read it (or spotted the links).
I have now reformatted the issue text to separate the output from the actual description.

@bmagyar
Copy link
Member

bmagyar commented Nov 21, 2017

HI everyone,

Could you try again with the new version of realtime_tools please?

@JamesAndrews3
Copy link
Author

JamesAndrews3 commented Nov 22, 2017 via email

@JamesAndrews3
Copy link
Author

JamesAndrews3 commented Dec 4, 2017 via email

@bmagyar
Copy link
Member

bmagyar commented Mar 23, 2018

Any update on this?

@toliver
Copy link

toliver commented Mar 26, 2018

Hi @bmagyar we haven't tried any new fixes on this yet.
This crash only becomes a problem when we need to send a stream of goals where each one preempts the previous one (e.g. as we do to control a hand from a cyberglove).

In that case, we can remove these lines https://github.com/ros-controls/ros_controllers/blob/kinetic-devel/joint_trajectory_controller/include/joint_trajectory_controller/joint_trajectory_controller_impl.h#L482-L490 that deal with the feedback and the crash doesn't happen.

We haven't had the time to test the quick fix suggested by @ipa-mdl in #301 (comment) but it sounds like it could work.

@7675t
Copy link
Contributor

7675t commented Mar 29, 2018

Hi, I have encountered this issue with our controller when we migrated to Kinetic.

We can reproduce the problem using ur_gazebo

$ roslaunch ur_gazebo ur5.launch 

and sending goal in relatively high rate with this script.

$ ./check_jta.py

The controller dies in these lines, and when I remove these lines, it keeps working.

I have rebuilt with updated realtime_tools (kinetic-devel) code, but it doesn't resolve the issue.
I'm very pleased if this is fixed in near future release. Thank you, all developers.

@k-okada
Copy link
Contributor

k-okada commented Mar 30, 2018

see #325 for adding mutex before accesing rt_active_goal_ as proposed by #301 (comment)

@mathias-luedtke
Copy link
Contributor

adding mutex before accesing rt_active_goal_ as proposed by #301 (comment)

I did not propose to add a mutex/lock.

piyushk added a commit to piyushk/ros_controllers that referenced this issue Apr 9, 2018
Stop publishing feedback to avoid segmentation fault as explained in ros-controls#301. The `rt_active_goal_` pointer can get deleted in a different thread leading to crashes.
@dg-shadow
Copy link

dg-shadow commented Jun 12, 2018

FYI, #342 fixed the issue for us :)

@bmagyar
Copy link
Member

bmagyar commented Jun 12, 2018

We shall consider this issue solved then, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants