-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Get rid of service client nodes #816
Comments
@mjeronimo is this still an issue? |
It is still an issue. Having too many nodes impacts startup times and seems to overwhelm the current DDS implementations. Possible fixes to this issue are:
|
@crdelsey can you provide a few examples in the stack where we use "client nodes", other than costmap spinning, I had a hard time finding any |
I think the big one is here in ServiceClient. https://github.com/ros-planning/navigation2/blob/7652518af52f257d097269309986943771d98683/nav2_util/include/nav2_util/service_client.hpp#L38 Then, there's another use in LifecycleManagerClient (https://github.com/ros-planning/navigation2/blob/a095038da9f4e3729679e5d20b39e1d95582fb16/nav2_lifecycle_manager/src/lifecycle_manager_client.cpp#L30). There may be other places. |
Got it, I didn't see that hidden in there. "Fix DDS" I feel like is a never ending battle, I'd say 2 or 3 seem good general long term fixes but 4 seems more practically in our control. |
Any objection to using multithreaded executors? It seems that way we can have these all going at once with one node. Do you think there are applications here sensitive enough (or otherwise not well formatted) for multithreaded processing? Most places have mutex where we might want to use a multithreaded spinner. I wouldn't think it would be too much of a hassle to update the few places required. |
@mjeronimo Do you have any updates on this? Let me know even if you have any unfinished work on this. |
@mjeronimo Shivang is taking a look at this topic. Carl mentioned that you had a branch and started to work on it that either may be done or had some good progress. Can you give us a run down on where that branch is and what still needs to be completed to support? |
@shivaang12 @SteveMacenski Sorry, guys I haven't been on github recently (we're using gitlab internally for my current project). Anyways, @orduno did the most exploration on this topic. He prototyped at least one alternative approach that, I believe, used multi-threaded executors. I've also brought up the topic directly to folks at Open Robotics and they agree that this is an issue (but I don't think there are any plans to allow re-entrant spinning on the current node anytime soon). In general, I think it's a good idea to not proliferate nodes like this because of the discovery traffic introduced and the possibility of dropping messages. |
Can confirm on the TSC side, doesn't seem likely for Foxy (or G-turtle... bigger fish to fry). On my wish list of things for OR people to work on that effect navigation users, this isn't high at the moment. Awesome, hopefully Carlos can chime in. Ignoring Foxy, I'd like 1 server to 1 node mapping in the next LTS distribution. Multithreaded executors seems to be the lowest-effort clearest way to handle this, though that does create issues with the components API. The question then becomes: do we value using components or minimizing nodes more? Currently, I think that answer is nodes from how inefficient it is to have multiple. However, that's improving by the week, so I think long term that answer is components. |
I am currently making a PR to use callback groups for service clients instead of nodes #2216. On the Rolling distribution, there are callback groups that can be attach to a node and added to executors. This allows to use of It can be used to execute synchronous service/action call in timer/subscriber callback. |
I think this comment gives a good summary of what should be done for each case. |
Unfortunately, from deeper looking, each new executor creates a new context (unless one given on construction) therefore , with the new design https://design.ros2.org/articles/Node_to_Participant_mapping.html each context now maps to a DDS participant so this is really no different than just having multiple client nodes. open question then: if we gave the I think this probably deserves a dedicated redesign session to think about the executor and threading options we have available and go through our server list to see what makes the most sense in order to reduce the number of DDS participants and nodes. We should also analyze what actions+services+subscribers exist in each server and add them to separate callback groups if necessary so spinning has a deterministic output to processing the "right" information in each context (not-ROS-context, software one). Right now everywhere are just automatically added to one. Especially for planner / controller / waypoint servers, it may make sense to separate these callback groups and all processing in parallel or spinning a particular one. The right answer is probably a mix of multi-threaded executors and multiple single-threaded executors depending on the server's needs. Also need to determine the context question above whether we can share that across multiple executor objects to run in parallel but not add additional participants on the network. |
I spoke with Chris from OR @clalancette and he gave me the key to my understanding on this -- while a new context is constructed on a new executor, when none is given as an input, it uses the default context which stores a static variable to use https://github.com/ros2/rclcpp/blob/master/rclcpp/src/rclcpp/contexts/default_context.cpp#L23. Therefore, any new executor, node, etc will use that same context regardless, unless you give a context to the object on construction. Thusly, we can continue this work but the question remains how necessary this is. Essentially, this is a refactoring effort now. We're working on turning It would be good to do this to refactor and make the best use of ROS2, but it is no longer a critical requirement. I believe we now have sufficient information to make decisions and execute on this task when someone has time to work on it. |
I see several nodes that can still be removed.
|
I have searched all internal nodes which need be removed,and i will do this work in this summer. Any suggestions are welcome. Remove
Other independent cases
|
@gezp I saw another of your PRs was merged, is this now good to move forward on the AMCL / costmap PRs? |
@SteveMacenski the upstream PR ros2/geometry2#447 is still blocked. |
Final PR imminent to merge |
There are several nodes in the system that are created and used solely for interfacing to services. These "client nodes" are used because spin_until_future_complete cannot accept a node that is already on an executor. Instead, we can use future.wait_until instead of spin_until_future_complete. This will allow us to remove several of these client nodes in the system, which will hopefully ease discovery congestion.
The text was updated successfully, but these errors were encountered: