-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bug: unsoundness of rmw_shutdown
#170
Comments
@YuanYuYuan could you provide steps to reproduce the error? With the patch above, are you suggesting that it is ok to not close the zenoh session? Wouldn't that lead to memory leaks? @clalancette should we try moving this block back into |
We need to take a close look before making changes here again. I've moved it twice already for 2 different bugs, so we need to go back, look at the contexts these are being called in (for both rclcpp and rclpy), and see what the previous bugs were. |
|
The issue disappeared after the patch. If you want to use the scripts, please note that you need to update the repository here. The issue occurs no matter running intra-process or inter-process test with rmw_zenoh. |
Same here. I'm trying to operate ApexAI/performance_test, and encounted the same issue. But I'd like to report that applying the patch suggested in this post is working fine in my environment. Question just in case: I cannot access evshary/benchmark_rmw_zenoh that was mentioned in this thread. I'm wondering the method (scripts?) of dealing with this issue is the same as suggested in this post? |
Hi @takasehideki! The script has been moved here. And the fix is still the same. |
@YuanYuYuan i'm looking into this issue more closely now and could benefit from your insights. Summarizing some basic information first:
Above you mentioned
This would imply that anytime I terminate a ROS executable (written with Or could it be that we see this unsoundness when I did a quick search for |
Hi @Yadunund, thanks for spending time on this issue. I think the panic depends on some race conditions I'm still investigating. Here is one that can steadily reproduce. eclipse-zenoh/zenoh-cpp#198. |
Hi @YuanYuYuan
Thank you so much for pointing this out. But unfortunately I cannot still access this link (maybe due to the repository is set to private). But it is not a problem for me because I can now catch up that the fix is the same. |
My bad! I'm not even aware that the repo has been closed 😅. Yup, the fix should be the same. |
@YuanYuYuan I'm not sure if this is a race condition. I have a much simpler reproducible scenario that confirms my hypothesis above. See https://github.com/Yadunund/rmw_zenoh_shutdown_test. Basically, if we let the Therefore we need to ensure that the This also explains why we see the panic all the time if we close the Zenoh session in Therefore until the tokio behavior is fixed upstream, I propose the following workarounds
|
With ros2/rclcpp#2633, the shutdown issue has disappeared for all tests in |
We notice that ApexAI performance test with rmw_zenoh keep failing with the error.
This happens since we call
z_close
within__run_exit_handlers
. We had concluded that termination of tokio runtime within theatexit
handler is unsoundand leads to undefined behavior. See this and this. Basically, the termination during exit stage is beyond what tokio runtime covering. Rust is designed not to drop any static variable until the end of the program. Calling destruction in exit handlers is also error-prone even in C/C++.
On the other hand, rmw_cyclonedds seems not to use any explicit termination during the exit stage. https://github.com/ros2/rmw_cyclonedds/blob/c6dbe24b2f2be87cf8e4750d89501657ab83566f/rmw_cyclonedds_cpp/src/rmw_node.cpp#L1629. I've ran the same test with the following patch. The error disappears and the rmw_zenoh seems fine.
In conclusion, we have a few solutions to it,
exit
phase.TBH, I'm not sure if 2. and 3. are feasible in ROS RMW. Any comment is welcome! 🙂
The text was updated successfully, but these errors were encountered: