Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for the seg-fault that occurred when RPP was tested at high speed #2474

Closed

Conversation

padhupradheep
Copy link
Member

@padhupradheep padhupradheep commented Jul 23, 2021


Basic Info

Info Please fill out this column
Ticket(s) this addresses #2332
Primary OS tested on Ubuntu
Robotic platform tested on mp-400 from Neobotix

Description of contribution in a few bullet points

  • Fixed the segfault issue that occurred when the robot was set to drive at high speed using RPP.

Description of documentation updates required from your changes

NA


Future work that may be required in bullet points

NA

For Maintainers:

  • Check that any new parameters added are updated in navigation.ros.org
  • Check that any significant change is added to the migration guide
  • Check that any new functions have Doxygen added
  • Check that any new features have test coverage
  • Check that any new plugins is added to the plugins page
  • If BT Node, Additionally: add to BT's XML index of nodes for groot, BT package's readme table, and BT library lists

Copy link
Member

@SteveMacenski SteveMacenski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good first stab, only a couple of simple tweaks and this can be merged

@@ -450,7 +456,13 @@ bool RegulatedPurePursuitController::inCollision(const double & x, const double
double RegulatedPurePursuitController::costAtPose(const double & x, const double & y)
{
unsigned int mx, my;
costmap_->worldToMap(x, y, mx, my);
auto convertWorldToMap = costmap_->worldToMap(x, y, mx, my);
if (!convertWorldToMap) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is only used when the x,y represents our robot pose, this should never be out of bounds.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think these changes in this function can be reverted?

Copy link
Member Author

@padhupradheep padhupradheep Jul 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is not a good idea. If you remember the second issue that I brought up in #2332 , these changes actually solves that issue.

This function is only used when the x,y represents our robot pose, this should never be out of bounds.

Explaining it further: With the parameters set in that issue, the current position of the robot can sometimes be outside of the local costmap. Then again, we have a seg fault. In order to avoid it, we check for the bounds.

Copy link
Member

@SteveMacenski SteveMacenski Jul 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please give me an example flow of data that this would prevent? What I see is:

The point you bring up about the lookahead distance being off the costmap is a good point, but where specifically does that apply here? I decoupled in the code the lookahead distance / carrot from the collision checking. We never actually check the validity of the carrot or lookahead distance, only the output velocity command for the time simulation you have covered in the other function

Copy link
Member Author

@padhupradheep padhupradheep Jul 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that pose represents the robot's current pose -- so there's no look ahead applied and my assumption is that at all times the robot should be on the costmap.

Okay, I totally agree with the data flow you have mentioned above this point. But, the actual behavior is other way round to the actual assumption you are defining. It is always not the case, there are instance where the current position of the robot can be outside the local costmap. Even before conducting this experiment, I was also under the same assumption, but by setting the robot go at high speed, the rolling costmap will take sometime to catch up with the robot (maybe because of the update rate). Example:

seg_fault

Note that, this works only with the fix I have made above. Without the fix, it ends up in a segfault as follows:
wiseg_fault

Maybe, I'm observing it in totally different way, that might be not correct. It is a bit tricky but an interesting one!

Copy link
Member

@SteveMacenski SteveMacenski Jul 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems.... bad. I know you set it to 1x1 to prove a point, but what if this was a more normal size like 5x5 or a more reasonable update rate like 5hz? We should cover the crash, you bring up a good point, but those are some extraordinarily poor configurations 😆. I think we just need to change the logging error message for this context to be more specific and clear about what this issue is: robot is off the costmap due to poor configuration. This shouldn't be throttled like before, we should complain every single iteration that happens as a critical issue since there is no possible collision avoidance at all in this situation

Edit: Actually, we should outright fail. That's a massive safety issue, log a FATAL message and throw an exception. The other issue is less severe if we can only collision check as far in the future as the costmap's size, this is an outright non-stop-robot-collision maker.

@SteveMacenski
Copy link
Member

Any update?

@padhupradheep
Copy link
Member Author

Any update?

Expect a commit for the above reviews, on this friday.

@SteveMacenski
Copy link
Member

Pull in master or rebase to get CI turning over, has to do with a stale branch from changes we made to sync with Rolling

Copy link
Member

@SteveMacenski SteveMacenski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly just wording changes to be more explicit

"The dimensions of the costmap is smaller, the controller failed and so the robot cannot proceed further");
throw nav2_core::PlannerException(
"RegulatedPurePursuitController detected poor configuration of local costmap!");
return std::numeric_limits<double>::max();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing after a throw is executed, so you can remove this return line.

costmap_->worldToMap(x, y, mx, my);

if (!costmap_->worldToMap(x, y, mx, my)) {
RCLCPP_FATAL_ONCE(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not make this a ONCE, this should be a FATAL warning every single time it gets triggered

@padhupradheep
Copy link
Member Author

created new PR #2526

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants