Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Global Costmap Issue on map change #3267

Closed
mrmara opened this issue Nov 3, 2022 · 7 comments
Closed

Global Costmap Issue on map change #3267

mrmara opened this issue Nov 3, 2022 · 7 comments

Comments

@mrmara
Copy link
Contributor

mrmara commented Nov 3, 2022

Bug report

Required Info:

  • Operating System:
    Distributor ID: Ubuntu
    Description: Ubuntu 20.04.4 LTS
    Release: 20.04
    Codename: focal
  • ROS2 Version:
    • ROS2 galactic from source
  • Version or commit hash:
  • 04031ba
  • DDS implementation:
    • Happens both on Cyclone and Fast

Steps to reproduce issue

With map_server load a new map after already having loaded another one before. Global costmap will fail roughly 70% of times.

Expected behavior

The new global costmap is correctly resized and system is ready to accept new navigation goals

Actual behavior

The following bug never happens at start-up.
When a new map is loaded with map server, global costmap often does not resize and blocks the system from accepting new navigation goal.
Moreover global costmap does not respond anymore to services(clear_entirely_global_costmap, get_parameters, etc...)

Similar issue: ros-planning/navigation#959. Unfortunately the getting rid of !layered_costmap_->isSizeLocked() inside the if condition does not fix.
https://user-images.githubusercontent.com/48493979/199747297-636d8e02-6621-4479-923c-09304faa5230.mp4

@SteveMacenski
Copy link
Member

SteveMacenski commented Nov 3, 2022

The map is constantly being updated in SLAM so I would find it hard to believe its an issue with the static layer in receiving map updates unless there's something particularly unique about your maps (extraordinarily large or out of resources on your machine to allocate the memory).

If you re-run the map server loading several times, does it work? When the costmap is unable to get the map, are you able to visualize it in rviz (e.g. is anything getting the map?).

By the way, Galactic is EOL as of 3 days ago. I'd highly recommend you upgrade.

@mrmara
Copy link
Contributor Author

mrmara commented Nov 3, 2022

The map is getting correctly published by the map_server, I can see it on Rviz and the global costmap receives it infact I can see following logs from StaticLayer::processMap(const nav_msgs::msg::OccupancyGrid & new_map):

[planner_server-1] [INFO] [global_costmap.global_costmap]: StaticLayer: Received a 565 X 1603 map at 0.050000 m/pix (processMap() at navigation2/nav2_costmap_2d/plugins/static_layer.cpp:175)
[planner_server-1] [INFO] [global_costmap.global_costmap]: StaticLayer: Resizing costmap to 565 X 1603 at 0.050000 m/pix (processMap() at navigation2/nav2_costmap_2d/plugins/static_layer.cpp:191)

Map is not oddly large and I am running a fairly powerful machine (16Gb ram, i7 11th gen).

Moving to Humble is definitely the next move, but in this moment It would be preferable for my application to have it working on galactic. I am currently checking the differences in Humble but I do not see anything that could fix it. I am investigating the issue and I will try to reproduce the same issue also on humble (if present).

@SteveMacenski
Copy link
Member

SteveMacenski commented Nov 3, 2022

Please do investigate, when you know more let me know! I'm happy to help, but so far there's not much for me to go on unfortunately ☹️

So is the server hanging or just not completing the job? If its hanging, that would be good to know where.

I agree with that map size / computer specs its not that. I didn't know if you were running this on an RPi or something.

@mrmara
Copy link
Contributor Author

mrmara commented Nov 3, 2022

Following your advice, I compared what's being done differently in Humble wrt Galactic. I think I found something and in the last 20-30 map changes no faults happened. I will continue to test and if the issue comes again I will let you know.
Coming back to the fix, in Galactic branch the callback of the subscriber of map topic is:

void StaticLayer::incomingMap(const nav_msgs::msg::OccupancyGrid::SharedPtr new_map)
{
  std::lock_guard<Costmap2D::mutex_t> guard(*getMutex());
  if (!map_received_) {
    map_received_ = true;
    processMap(*new_map);
  }
  if (update_in_progress_.load()) {
    map_buffer_ = new_map;
  } else {
    processMap(*new_map);
    map_buffer_ = nullptr;
  }
}

On the other hand in the main branch is:

void
StaticLayer::incomingMap(const nav_msgs::msg::OccupancyGrid::SharedPtr new_map)
{
  if (!map_received_) {
    processMap(*new_map);
    map_received_ = true;
    return;
  }
  std::lock_guard<Costmap2D::mutex_t> guard(*getMutex());
  map_buffer_ = new_map;
}

And finally in Humble branch is:

void
StaticLayer::incomingMap(const nav_msgs::msg::OccupancyGrid::SharedPtr new_map)
{
  std::lock_guard<Costmap2D::mutex_t> guard(*getMutex());
  if (!map_received_) {
    map_received_ = true;
    processMap(*new_map);
  }
  if (update_in_progress_.load()) {
    map_buffer_ = new_map;
  } else {
    processMap(*new_map);
    map_buffer_ = nullptr;
  }
}

I have just updated this method to its main branch version and the problem seems to be fixed.
It is unclear to me why, but apparently directly calling processMap from the callback interferes with the execution of updateBounds. A mutex correctly protects them so I cannot understand where the problem is. On the contrary, if map_buffer is updated with the new map the method processMap is called by updateBounds and the global costmap does not become unresponsive anymore. Do you have any hints?
As already mentioned, the fix seems working, but I will keep the discussion updated if the bug comes back.

Regarding humble branch, I have not directly tested it but It may be buggy as well since it shares the same code of Galactic branch. Do you want me to update both branches with the fixed ``incomingMap``` method?

@SteveMacenski
Copy link
Member

SteveMacenski commented Nov 3, 2022

Oh. I think you ran into the same deadlock issue as #3109 describes and the PR linked in it resolves. That was merged mere days after the last sync so its not in Humble yet, but will do one at the end of next week (I skipped a cycle due to ROSCon)

Does this patch resolve your issue? #3145

@mrmara
Copy link
Contributor Author

mrmara commented Nov 3, 2022

Yes it does, just checked. Thank you!

@SteveMacenski
Copy link
Member

OK - I'm going to close this since we have a known solution and it will be backported in the coming week or two to the supported distributions.

Thanks for the debugging effort!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants