Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] (Risk of a) deadlock in ros::Timer impl? #1980

Closed
CodeFinder2 opened this issue Jun 26, 2020 · 4 comments
Closed

[BUG] (Risk of a) deadlock in ros::Timer impl? #1980

CodeFinder2 opened this issue Jun 26, 2020 · 4 comments

Comments

@CodeFinder2
Copy link

CodeFinder2 commented Jun 26, 2020

Hi all,

I've already described my issue here in detail: https://answers.ros.org/question/355644/possible-risk-of-a-deadlock-in-rostimer-impl/

But since I didn't get any reply, I would like to post it here as well since it may be a bug.

Please refer to my linked post above on ROS Answers for the details.
TL;DR: I am using async. spinners with multiple threads and a ROS timer. All my callbacks have to lock a specific mutex first before doing anything in the callbacks. But what happened was that my timerCallback() was invoked and the above mentioned mutex was already locked by another thread - perfectly fine so far. That other thread was calling timer.stop() and since my timerCallback() did not return (and cannot because it is still passively waiting for the mutex to be released), timer.stop() waited infinitely -> deadlock.

Im my opinion, this should not happen. I mean, when I issue a timer.stop() but the timerCallback() was already invoked slightly before, it should be allowed to continue without any locking. timer.stop(), once called, should prevent to invoke any new callbacks but shouldn't care about a callback / event that has been triggered already.

Please take a look and let me know what do you think!

This was happening on Ubuntu 20.04 LTS with ROS Noetic. Unfortunately, because this was a race condition happening only very rarely, I may not simply be able to reproduce it.

Thanks for taking a look!

@CodeFinder2
Copy link
Author

CodeFinder2 commented Sep 4, 2020

Any ideas on this? Facing this issue again in a different situation...really annoying.

Any help is (still) highly appreciated...

@CodeFinder2 CodeFinder2 changed the title (Risk of a) deadlock in ros::Timer impl? [BUG] (Risk of a) deadlock in ros::Timer impl? Sep 4, 2020
@iwanders
Copy link
Contributor

@CodeFinder2, problem is that the timer cannot be stopped when a callback associated to that timer is being executed. Combined with the external lock this can currently cause a deadlock.

We encountered this problem in a production system. I've reproduced it with the attached minimal-non-working example here.

I also have a proposed fix / mitigation for this which I'll be filing shortly.

@CodeFinder2
Copy link
Author

Thanks for your reply and for providing a fix! I also came up with a temporary solution (by creating a new thread in the timerCallback() to let it return immediately) but doesn't seem to be a good way to go ... 🙈

@jacobperron
Copy link
Contributor

Resolved in #2121

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants