You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
expected behavior what the degisner wish should be like
costmap_ros_->deactivate() works during planner/controller->on_deactivate(), as well as costmap_ros_->cleanup() works during planner/controller->on_cleanup()
As an individual LifecycleNode, costmap_ros_ may react to rcl_preshutdown randomly, and always be closed much earlier than planner/controller->deactivate()
it would lead to many unexpected crashes like bug mentioned in #3958 :
during shutdown_period, costmap_ros_ may react to rcl_preshutdown earlier, so that planner/controller may try to access the costmap_ros_ though it has been a nullptr. As a result it would lead to a crash during shutdown period.
such crashes may lead to bad situation that :
program shut down in an unplanned manner and result in residual processes.
the bug happening in ISSUE: #3958 is caused by this.
because during shutdown_period, costmap_ros_ may react rcl_preshutdown earlier, so that planner/controller may try to access the costmap_ros_ though it has been a nullptr.
so can we make it do nothing when reacting to Ctrl+C ?
doing so, its lifecycle would be completely bond to planner/controller's lifecycle, and also going to death in order.
//in file `costmap_2d_ros.hpp`/** * @brief override on_rcl_preshutdown() as empty * [reason] costmap only could be created by its parents like planner/controller_server * [reason] costmap may react to ctrl+C earlier than its parents to cause nullptr-accessed * so the costmap's reaction must be later than its parent to avoid nullptr-accessed * * thus, it's a more perfect way to override on_rcl_preshutdown() as empty function * and its deactivate must be joint in planner/controller_server->on_deactivate() * and its cleanup must be joint in planner/controller_server->on_cleanup() * * !!! a mention !!! * if any other place use this class (Costmap2DROS) * it's necessary to let costmap->deactivate() joint in parent->deactivate() * and let costmap->cleanup() joint in parent->cleanup() */voidon_rcl_preshutdown() override{
//do nothingreturn ;
}
because the costmp_ros_ is just a thread created by planner/controller,
whether program shutdown correctly or not, or even dead unexceptionably, the child-thread could be closed all the time since parent-LifecycleNode sub-process dead.
Thus there's no need to worry whether it would be still alive after whole program exit finally.
in addition
if our solution being adopted, many checks could be removed :
these double checks are also designed to face to bugs in situation that costmap_ros_ may be closed earlier than planner/controller. After our code modification, these checks are useless anymore.
what's more, our solution performs perfectly during our local tests, much more than our past solutions mentioned in #3958
Further Suggestion
We specially name nodes like costmap_ros_ as child-LifecycleNode, which is born from their parent-LifecycleNode like planner_server and controller_server.
our suggestion is:
Could we set up a mechanism specially for such situations like child-LifecycleNode, to make sure they're completely controlled by their parent?
these child-LifecycleNode should have some basic specialist like:
they must be created by a parent-LifecycleNode
they must be created as a thread but not a sub-process
such mechanism could be like:
child's->activate should be joint into its their parent's on_configure
child->deactivate()must be joint into its parent's on_deactivate
child->cleanup()must be joint into its parent's on_cleanup
and so on
Perhaps such mechanism could be used for any program in ROS2 ?
The text was updated successfully, but these errors were encountered:
Bug report
Required Info:
Steps to reproduce issue
Expected behavior
expected behavior what the degisner wish should be like
costmap_ros_->deactivate()
works duringplanner/controller->on_deactivate()
, as well ascostmap_ros_->cleanup()
works duringplanner/controller->on_cleanup()
.../nav2_planner/src/planner_server.cpp#L224 and L257, and similar code in
nav2_controller
Actual behavior
As an individual LifecycleNode,
costmap_ros_
may react to rcl_preshutdown randomly, and always be closed much earlier than planner/controller->deactivate()
it would lead to many unexpected crashes like bug mentioned in #3958 :
during shutdown_period,
costmap_ros_
may react to rcl_preshutdown earlier, so that planner/controller may try to access thecostmap_ros_
though it has been a nullptr. As a result it would lead to a crash during shutdown period.such crashes may lead to bad situation that :
program shut down in an unplanned manner and result in residual processes.
Additional information
costmap is set as a Nav2::util LifecycleNode : ../nav2_costmap_2d/include/nav2_costmap_2d/costmap_2d_ros.hpp#L 73
all LifecycleNode would react to Ctrl+C ../nav2_util/src/lifecycle_node.cpp#L 90
all LifecycleNode would react to rcl_preshutdown randomly, and similar issue has been discussed before : ../ros2/rclcpp/issues/2096
this explanation should have been clear enough: ../nav2_planner/src/planner_server.cpp #L 214-L225
the bug happening in ISSUE: #3958 is caused by this.
because during shutdown_period,
costmap_ros_
may react rcl_preshutdown earlier, so that planner/controller may try to access thecostmap_ros_
though it has been a nullptr.our provided solution
it seems that
costmap
is only used as a child-thread of planner/controller, ../nav2_planner/src/planner_server.cpp#L 89so can we make it do nothing when reacting to Ctrl+C ?
doing so, its lifecycle would be completely bond to planner/controller's lifecycle, and also going to death in order.
because the
costmp_ros_
is just a thread created by planner/controller,whether program shutdown correctly or not, or even dead unexceptionably, the child-thread could be closed all the time since parent-LifecycleNode sub-process dead.
Thus there's no need to worry whether it would be still alive after whole program exit finally.
in addition
if our solution being adopted, many checks could be removed :
double check in planner_server->on_deactivate()
double check in planner_server->on_cleanup()
double check in controller_server->on_deactivate()
double check in controller_server->on_cleanup()
these double checks are also designed to face to bugs in situation that
costmap_ros_
may be closed earlier than planner/controller. After our code modification, these checks are useless anymore.what's more, our solution performs perfectly during our local tests, much more than our past solutions mentioned in #3958
Further Suggestion
We specially name nodes like
costmap_ros_
as child-LifecycleNode, which is born from their parent-LifecycleNode likeplanner_server
andcontroller_server
.our suggestion is:
Could we set up a mechanism specially for such situations like child-LifecycleNode, to make sure they're completely controlled by their parent?
these child-LifecycleNode should have some basic specialist like:
such mechanism could be like:
child's->activate
should be joint into its their parent'son_configure
child->deactivate()
must be joint into its parent'son_deactivate
child->cleanup()
must be joint into its parent'son_cleanup
Perhaps such mechanism could be used for any program in ROS2 ?
The text was updated successfully, but these errors were encountered: