Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The solution proposal for panic in the program termination #324

Open
evshary opened this issue Nov 26, 2024 · 2 comments
Open

The solution proposal for panic in the program termination #324

evshary opened this issue Nov 26, 2024 · 2 comments
Labels
backlog Long-term improvement or addition

Comments

@evshary
Copy link
Contributor

evshary commented Nov 26, 2024

I would like to raise a discussion about the possible solution for https://github.com/ros2/rmw_zenoh/tree/8306a63313cd9e21883344deff9abc782636a464?tab=readme-ov-file#crash-when-program-terminates
I thought we had reached an agreement on this in the meeting before. Nevertheless, it might be worth writing it down here to make it clearer.

I know we expect users to shutdown the ROS Context explicitly, but it would be good if there is a way to detect this, or we might need to add shutdown to all the packages without it. For example, in ros_gz used by nav2
ZettaScaleLabs#26 (comment)

The proposal:

The Zenoh session can't be closed while terminating the program due to the reason described in README. However, while calling rmw_shutdown, there is no available way to know if we can close the Zenoh session. Therefore, we come up with a mechanism to detect the termination of the program. We can register the atexit function, and while the entering the termination stage, it will toggle a flag. If the flag is set, we shouldn't touch the Zenoh session to avoid panic.
For more information, please check the PR here: https://github.com/ZettaScaleLabs/rmw_zenoh/pull/37/files

Some questions:

  1. Is this logic only necessary with 1.0.0?
    • Not really. I can do the porting to rolling if we all agree with the changes.
  2. How do we ensure two or more threads do not access is_exiting concurrently?
    • I thought it didn't cause any issue since this is a one-way flag, which means we don't toggle it back to false. However, I can add atomic here to avoid possible concurrency issues.
  3. When the program terminates, static ordering is unknown so there is no guarantee that is_exiting is valid.
    • I'm not sure whether I misunderstood the spec or not. According to the spec here, the destructor will be called in a reverse order. I think that means the global variable static bool is_exiting should be valid before the Context is shutdown. Feel free to correct me if I'm wrong.

Let me know your thoughts. Thank you!

Also cc @JEnoch here

@evshary
Copy link
Contributor Author

evshary commented Nov 28, 2024

After the PR, we might not need this workaround
cc @yellowhatter here

It should be related to the PR eclipse-zenoh/zenoh#1632

@YuanYuYuan
Copy link
Contributor

YuanYuYuan commented Dec 5, 2024

@Yadunund Yadunund added the backlog Long-term improvement or addition label Jan 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backlog Long-term improvement or addition
Projects
None yet
Development

No branches or pull requests

3 participants