Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make navigation map spatial queries thread-safe #79577

Merged
merged 1 commit into from
Feb 12, 2024

Conversation

smix8
Copy link
Contributor

@smix8 smix8 commented Jul 17, 2023

Makes navigation map spatial queries thread-safe by adding a readers–writer lock for the NavMap.

This avoids subtle bugs and sometimes even crashes related to the navigation map synchronization while users used threads to do spatial queries on the navigation map.

Tested with spawning endless number of agents each having its own thread doing those spatial queries e.g. map_get_path() or map_get_closest_point(), in a while loop while an AnimationPlayer did change the navigation map every single frame. While Godot managed to eventually collapse my entire system at around 1000+ agents from having spawned so many threads the navigation map and pathfinding stood strong till the end.

Production edit: closes godotengine/godot-roadmap#55

@smix8 smix8 added bug topic:navigation cherrypick:4.1 Considered for cherry-picking into a future 4.1.x release labels Jul 17, 2023
@smix8 smix8 added this to the 4.2 milestone Jul 17, 2023
@smix8 smix8 force-pushed the navmap_rwlock_4.x branch from 6cec5c6 to 166bd26 Compare July 17, 2023 10:52
@YuriSizov YuriSizov requested a review from RandomShaper July 17, 2023 15:18
Copy link
Member

@RandomShaper RandomShaper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

My only concern is with map_update_id, which is checked in ERR_FAIL_COND when the read lock is still not held.

@smix8
Copy link
Contributor Author

smix8 commented Jul 19, 2023

My only concern is with map_update_id, which is checked in ERR_FAIL_COND when the read lock is still not held.

@RandomShaper
I think I covered all user exposed functions so the ERR_FAIL_CON for map_update_id == 0 is checked before any function sets a lock and can run further. I did not place it inside the locked part to avoid having to add another check for unlocking in case the error is true.

I don't think map_update_id needs a lock as even if this would be checked while another thread syncs and increases this counter I don't see it causing any real problems. It will also always only the main thread that will increase this id counter. Should I wrap the map_update_id into a SafeNumeric?

@smix8 smix8 force-pushed the navmap_rwlock_4.x branch from 166bd26 to 0a0b725 Compare July 19, 2023 12:08
@YuriSizov YuriSizov requested a review from RandomShaper July 19, 2023 13:48
@RandomShaper
Copy link
Member

My concern is with the following situation: just after the main thread increases map_update_id, another thread that hasn't had a chance to see the new value yet tries to run some query, and errors. That said, I think it's very unlikely in practice that such a thing can happen, but I can't see it being impossible.

Making it a SafeNumeric would add more "sync work," so I'd suggest either always dealing with map_update_id while the lock is held or just leaving it as it is now and hope for the best, maybe bearing in mind that some corner case issue might arise due to this.

On a side note, zero seems to be a valid value for map_update_id, when it rotates back to zero due to the mod. Should the sentinel value be -1 instead?

@smix8
Copy link
Contributor Author

smix8 commented Jul 27, 2023

Included the map_update_id in the locks.

On a side note, zero seems to be a valid value for map_update_id, when it rotates back to zero due to the mod. Should the sentinel value be -1 instead?

Now that I think about it, I think it should be changed so that in case of a wrap it does not trigger all the query error checks that fire when map_update_id == 0.

E.g. from map_update_id = (map_update_id + 1) % 9999999; to map_update_id = (map_update_id % 9999998) + 1;

@smix8 smix8 force-pushed the navmap_rwlock_4.x branch from 0a0b725 to 218aae4 Compare July 29, 2023 01:27
@smix8 smix8 force-pushed the navmap_rwlock_4.x branch 2 times, most recently from efafdae to 7e3a246 Compare August 4, 2023 12:32
@smix8 smix8 removed the cherrypick:4.1 Considered for cherry-picking into a future 4.1.x release label Oct 21, 2023
@smix8 smix8 modified the milestones: 4.2, 4.x Oct 21, 2023
@YuriSizov YuriSizov modified the milestones: 4.x, 4.3 Jan 30, 2024
@AThousandShips
Copy link
Member

I'd suggest using RAII locking, safer, and none of the locking here is very complex so would be better either way, like so:

diff --git a/modules/navigation/nav_map.cpp b/modules/navigation/nav_map.cpp
index fa591b6a44..78aba52854 100644
--- a/modules/navigation/nav_map.cpp
+++ b/modules/navigation/nav_map.cpp
@@ -116,9 +116,8 @@ gd::PointKey NavMap::get_point_key(const Vector3 &p_pos) const {
 }
 
 Vector<Vector3> NavMap::get_path(Vector3 p_origin, Vector3 p_destination, bool p_optimize, uint32_t p_navigation_layers, Vector<int32_t> *r_path_types, TypedArray<RID> *r_path_rids, Vector<int64_t> *r_path_owners) const {
-	map_rwlock.read_lock();
+	RWLockRead read_lock(map_rwlock);
 	if (map_update_id == 0) {
-		map_rwlock.read_unlock();
 		ERR_FAIL_V_MSG(Vector<Vector3>(), "NavigationServer map query failed because it was made before first map synchronization.");
 	}
 
@@ -171,7 +170,6 @@ Vector<Vector3> NavMap::get_path(Vector3 p_origin, Vector3 p_destination, bool p
 
 	// Check for trivial cases
 	if (!begin_poly || !end_poly) {
-		map_rwlock.read_unlock();
 		return Vector<Vector3>();
 	}
 	if (begin_poly == end_poly) {
@@ -198,8 +196,6 @@ Vector<Vector3> NavMap::get_path(Vector3 p_origin, Vector3 p_destination, bool p
 		path.write[0] = begin_point;
 		path.write[1] = end_point;
 
-		map_rwlock.read_unlock();
-
 		return path;
 	}
 
@@ -348,8 +344,6 @@ Vector<Vector3> NavMap::get_path(Vector3 p_origin, Vector3 p_destination, bool p
 				path.write[0] = begin_point;
 				path.write[1] = end_point;
 
-				map_rwlock.read_unlock();
-
 				return path;
 			}
 
@@ -436,8 +430,6 @@ Vector<Vector3> NavMap::get_path(Vector3 p_origin, Vector3 p_destination, bool p
 		path.write[0] = begin_point;
 		path.write[1] = end_point;
 
-		map_rwlock.read_unlock();
-
 		return path;
 	}
 
@@ -582,8 +574,6 @@ Vector<Vector3> NavMap::get_path(Vector3 p_origin, Vector3 p_destination, bool p
 		}
 	}
 
-	map_rwlock.read_unlock();
-
 	// Ensure post conditions (path arrays MUST match in size).
 	CRASH_COND(r_path_types && path.size() != r_path_types->size());
 	CRASH_COND(r_path_rids && path.size() != r_path_rids->size());
@@ -593,9 +583,8 @@ Vector<Vector3> NavMap::get_path(Vector3 p_origin, Vector3 p_destination, bool p
 }
 
 Vector3 NavMap::get_closest_point_to_segment(const Vector3 &p_from, const Vector3 &p_to, const bool p_use_collision) const {
-	map_rwlock.read_lock();
+	RWLockRead read_lock(map_rwlock);
 	if (map_update_id == 0) {
-		map_rwlock.read_unlock();
 		ERR_FAIL_V_MSG(Vector3(), "NavigationServer map query failed because it was made before first map synchronization.");
 	}
 
@@ -642,46 +631,38 @@ Vector3 NavMap::get_closest_point_to_segment(const Vector3 &p_from, const Vector
 		}
 	}
 
-	map_rwlock.read_unlock();
-
 	return closest_point;
 }
 
 Vector3 NavMap::get_closest_point(const Vector3 &p_point) const {
-	map_rwlock.read_lock();
+	RWLockRead read_lock(map_rwlock);
 	if (map_update_id == 0) {
-		map_rwlock.read_unlock();
 		ERR_FAIL_V_MSG(Vector3(), "NavigationServer map query failed because it was made before first map synchronization.");
 	}
 	gd::ClosestPointQueryResult cp = get_closest_point_info(p_point);
-	map_rwlock.read_unlock();
 	return cp.point;
 }
 
 Vector3 NavMap::get_closest_point_normal(const Vector3 &p_point) const {
-	map_rwlock.read_lock();
+	RWLockRead read_lock(map_rwlock);
 	if (map_update_id == 0) {
-		map_rwlock.read_unlock();
 		ERR_FAIL_V_MSG(Vector3(), "NavigationServer map query failed because it was made before first map synchronization.");
 	}
 	gd::ClosestPointQueryResult cp = get_closest_point_info(p_point);
-	map_rwlock.read_unlock();
 	return cp.normal;
 }
 
 RID NavMap::get_closest_point_owner(const Vector3 &p_point) const {
-	map_rwlock.read_lock();
+	RWLockRead read_lock(map_rwlock);
 	if (map_update_id == 0) {
-		map_rwlock.read_unlock();
 		ERR_FAIL_V_MSG(RID(), "NavigationServer map query failed because it was made before first map synchronization.");
 	}
 	gd::ClosestPointQueryResult cp = get_closest_point_info(p_point);
-	map_rwlock.read_unlock();
 	return cp.owner;
 }
 
 gd::ClosestPointQueryResult NavMap::get_closest_point_info(const Vector3 &p_point) const {
-	map_rwlock.read_lock();
+	RWLockRead read_lock(map_rwlock);
 
 	gd::ClosestPointQueryResult result;
 	real_t closest_point_ds = FLT_MAX;
@@ -701,8 +682,6 @@ gd::ClosestPointQueryResult NavMap::get_closest_point_info(const Vector3 &p_poin
 		}
 	}
 
-	map_rwlock.read_unlock();
-
 	return result;
 }
 
@@ -813,7 +792,7 @@ void NavMap::remove_agent_as_controlled(NavAgent *agent) {
 }
 
 void NavMap::sync() {
-	map_rwlock.write_lock();
+	RWLockWrite write_lock(map_rwlock);
 
 	// Performance Monitor
 	int _new_pm_region_count = regions.size();
@@ -1141,8 +1120,6 @@ void NavMap::sync() {
 	pm_edge_merge_count = _new_pm_edge_merge_count;
 	pm_edge_connection_count = _new_pm_edge_connection_count;
 	pm_edge_free_count = _new_pm_edge_free_count;
-
-	map_rwlock.write_unlock();
 }
 
 void NavMap::_update_rvo_obstacles_tree_2d() {

Makes navigation map spatial queries thread-safe by adding a readers–writer lock.
@smix8 smix8 force-pushed the navmap_rwlock_4.x branch from 7e3a246 to 4cc8748 Compare February 6, 2024 18:48
@akien-mga
Copy link
Member

@RandomShaper Could you check the new version?

@akien-mga akien-mga merged commit 42c3a38 into godotengine:master Feb 12, 2024
16 checks passed
@akien-mga
Copy link
Member

Thanks!

@smix8 smix8 deleted the navmap_rwlock_4.x branch February 16, 2024 10:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants