Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(Executor): Fix segfault if callback group is deleted during rmw_wait #2682

Open
wants to merge 1 commit into
base: jazzy
Choose a base branch
from

Conversation

jmachowinski
Copy link
Contributor

Fixes #2664

@alsora @mjcarroll A review please

@msplr Can you confirm that this fixes the issue ?

Comment on lines 746 to 750
for(const auto &w_ptr : callback_groups)
{
auto shr_ptr = w_ptr.lock();
if(shr_ptr)
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for(const auto &w_ptr : callback_groups)
{
auto shr_ptr = w_ptr.lock();
if(shr_ptr)
{
for (const auto & w_ptr : callback_groups) {
auto shr_ptr = w_ptr.lock();
if (shr_ptr) {

}
this->wait_result_.emplace(wait_set_.wait(timeout));
if (!this->wait_result_ || this->wait_result_->kind() == WaitResultKind::Empty) {
RCUTILS_LOG_WARN_NAMED(
"rclcpp",
"empty wait set received in wait(). This should never happen.");
} else {
// drop references to the callback groups, before trying to execute anything
cbgs.clear();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to check my understanding we don't need something like this in the if-clause because cbgs is on the stack and will be destroyed when exiting, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that is the idea.
At this point we need to explicitly drop the shared pointers, to make sure, we don't get callbacks by the 'execute' into something that was already dropped by 'userland'

Copy link
Member

@wjwwood wjwwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm with style fixups

@wjwwood
Copy link
Member

wjwwood commented Nov 22, 2024

I think we should push to get this fix in, but it would be nice to have a regression test for this case if we can manage it. I know it's hard with races like this, but still it would be good.

@wjwwood
Copy link
Member

wjwwood commented Nov 22, 2024

CI:

  • Linux Build Status
  • Linux-aarch64 Build Status
  • Linux-rhel Build Status
  • Windows Build Status

Copy link
Collaborator

@fujitatomoya fujitatomoya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jmachowinski @wjwwood

maybe we can work on #2683 1st and then backport the fix to jazzy including unit tests? it's been there already for a while, i think we can go along with normal procedure?

@wjwwood
Copy link
Member

wjwwood commented Nov 22, 2024

That's also fine with me, I'm not sure about the urgency.

@jmachowinski
Copy link
Contributor Author

That's also fine with me, I'm not sure about the urgency.

I was not aware of this bug until yesterday. To me, this is is worst bug from a user perspective. You do an allowed operation,
and your framework just segfaults on you. Therefore I want to get a fix merged ASAP. Sadly, we just missed the jazzy sync for the binaries by 2 days...

@msplr
Copy link

msplr commented Nov 23, 2024

@msplr Can you confirm that this fixes the issue ?

I confirm that it fixes the segfault in my sample code, thanks!

@fujitatomoya
Copy link
Collaborator

@jmachowinski that is true, and we just missed it... so probably what we do here is,

@msplr thanks!

@jmachowinski
Copy link
Contributor Author

Added a test, fixed the same bug in the StaticSingleThreadedExecutor....

@jmachowinski
Copy link
Contributor Author

* and backport [fix(Executor): Fix segfault if callback group is deleted during rmw_wait #2683](https://github.com/ros2/rclcpp/pull/2683) to jazzy.

This PR is the backport of #2683 to Jazzy...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants