Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Storing RIDs scene side can lead to dangling RIDs if a RID is freed elsewhere.
RIDReference allows scene side code to reference a RID indirectly, which will automatically be set to NULL when the object is freed, and thus help prevent dangling RIDs.
Fixes #74732
Discussion
While #74732 may seem like an innocuous issue, it is the symptom of a major safety problem in the 3.x codebase, and so some care should be taken over the solution, rather than attempting a quick fix.
Keeping a
RID
longterm in the scene side code is a very dangerous pattern (outside of outright ownership). The reason is thatRID
(in regular, non RID handles builds) is a glorified pointer, and deleting the RID object from elsewhere can lead to a dangling RID, which, like a dangling pointer, can lead to crashes.The physics was using this dangerous pattern to keep a RID to
on_floor_body
, used in themove_and_slide()
function. Deletion ofon_floor_body
leads to a crash the next timemove_and_slide()
is called.How to solve
There are number of ways of addressing this problem, some of the main ones I considered:
An example is a pending delete in the message queue to the server that will be executed before our call using the RID, but after we make this call (because of the order in the queue).
ObjectID
instead of aRID
where persistent storage of an object is required. (This method appears partly used in Godot 4 forCharacterBody3D
, although master still stores aRID
and looks like it will crash with similar circumstances to Heap-use-after-free in move_and_slide if the floor body is freed #74732.)This PR
This PR is quite a simple solution - it is a move towards a paradigm of disallowing client code from retaining direct references to RIDs (outside outright ownership) and instead holding
RIDRef
s, which are indirect, and can be NULLed as objects are deleted.This works for the linked issue but there is a caveat - it assumes a single threaded pattern. If another thread were to jump in and free a RID between the check for validity and the call to the server, then you could still theoretically get a dangling RID.
However, in 3.x at least, this client physics code I am hoping will be called from the same thread, so this race condition shouldn't happen.
This is something I'll need to double check with @reduz and confirm whether this is the best approach to this class of problem (at least in Godot 3.x).
Whatever approach we use here can probably be extended easily to cover the whole class of these bugs. For instance if we have a look at all the places RIDs are stored scene side, aside from ownership, we can convert these to use
RIDRef
s for extra safety. At the moment I've just addedSERVER_PHYSICS
but it's trivial to support the other servers by adding to the enum.Runtime performance
There is also a small cost, when deleting RIDs, it needs to run through the list of references to NULL out any dangling references. This is made more efficient by separating these lists per server rather than having them all lumped together. If a physics RID is being deleted, there's no need to search through e.g. Render RID references.
The lists are fairly simple to start with, so there is the possibility of being slow in extreme situations like benchmarks with thousands of references and lots of object "churn". If necessary things can be optimized further (with e.g. hash table for looking up the RIDs on deletion), but it is usually better to use a simple solution to start with, unless profiling shows more complexity is needed. (Also I want to discuss whether this is the best way to go before proceeding further.)