-
Notifications
You must be signed in to change notification settings - Fork 310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
One or more controllers fail to start up when starting all of them simultaneously #1934
Comments
Hello @bijoua29! We will need more debug information or logs for this particular case. It is very hard to pin point the issue you are mentioning. Why don't you spawn all the above controllers together in a single spawner?. This is the exact reason the PR #1805 is implemented. You can parse multiple controllers and multiple param files to the spawner and it should be able to handle everything. You can also use this part of the code : ros2_control/controller_manager/controller_manager/launch_utils.py Lines 103 to 139 in bb087e2
|
@saikishor Yes, I understand it is hard to pinpoint this issue. I am willing to generate more debug or log data but I will need to know exactly what. Meanwhile I will try the single spawner. Do you have an example of its usage in a launch file that I can look at? One question for my curiosity. Does the single spawner generate a single service call to the controller manager or still multiple service calls. If multiple, then I fear I might still see the issue. |
We use OpaqueFunction for our LaunchDescription: return LaunchDescription(
[SetEnvironmentVariable("ROS_LOG_DIR", launch_config.log_dir)]
+ declared_arguments
+ [OpaqueFunction(function=launch_setup)] so I don't think I can use the launch utilities in launch_utils.py but I can do the same thing as done in generate_controllers_spawner_launch_description_from_dict() in launch_setup |
Ok, I did take a look at the spawner code and it does generate a separate service call for each controller. However, since it is a single spawner node, I think it will be more robust as it there is only DDS discovery required for a single node. Whereas for multiple spawners, each one had to have its service client subscriber discovered by client manager, which led to problems under load. So I am hoping this will lead to a more robust startup for us. |
Awesome. Glad to hear that it does help with this single spawner. If you want things to be done in a single service call, pass Please let us know if this works for you? |
@saikishor Thanks for the response. That's nice you can do it in a single service call. Spawner has changed (for the better) quite a bit since I last looked at it. There are a few things for me to try. Since I have to test this in the target environment, and make changes manually there, the testing will take quite some time. I will let you know my results. I'm really hopeful this will work for me. |
@bijoua29 We thought of these kind of use cases in the first place and then stated improving it. It would be really great to see that it helps someone hahhaa |
@saikishor So some of our controllers are meant to be loaded and started as inactive while the rest should be started as active. How would I start some of the controllers as inactive when using a single spawner? |
You need to have separate single spawners for the group of controllers with the same desired initial state |
That's what I thought. I started doing it that way. |
So I converted the controllers startup to 2 spawners, one for controllers to start active and another to start inactive. Unfortunately, I still see failures starting up. The success rate is either the same as before or maybe even slightly worse. There was one benefit for the change to a single spawner - the startup time was slightly less. There seem to be different failure modes. Here are examples of them:
At this point, I am looking for guidance on how to further instrument the controller startup to determine where the problem is. |
Describe the bug
When starting several controllers simultaneously using the spawner, one or more of the controllers fail to start up
To Reproduce
Steps to reproduce the behavior:
Expected behavior
All controllers come up without error
Screenshots
N/A
Environment (please complete the following information):
Additional context
The number of controllers that don't load is random. Out of our 16 controllers, we have 7 instances of a particular type of a custom controller. Anecdotally, it is usually one or more these controllers that don't load up.
I can't really produce a minimal example. In fact, we only see this on our target hardware and believe it is related to CPU load.
I can't repro this on my laptop as it is much more powerful than the target hardware. Additionally, I have tried to preload my laptop with additional load using 'stress' but I still couldn't repro the issue so it may be something else e.g. disk I/O.
I don't understand the error message as it indicates the controller is already loaded but it isn't.
Note; we religiously update our software to the latest rolling sync every month.
Anecdotally, it seems there was some improvement when we went from ros2_control version 4.18(~10% success rate) to 4.20(~40-50% success rate).
It seems this issue has cropped up only in the last couple of months. A few months ago, failures were fairly rare. The above error message also is something that is fairly new to us as previously the occasional failures just had the controller "process has died" message. I realize there has been a fair amount of work in the spawner area recently.
This is a very serious issue for us as it takes several tries for our SQA to start our software stack when they are testing.
The text was updated successfully, but these errors were encountered: