Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[3.x] Add RIDReferences #77616

Closed
wants to merge 1 commit into from
Closed

Conversation

lawnjelly
Copy link
Member

@lawnjelly lawnjelly commented May 29, 2023

Storing RIDs scene side can lead to dangling RIDs if a RID is freed elsewhere.

RIDReference allows scene side code to reference a RID indirectly, which will automatically be set to NULL when the object is freed, and thus help prevent dangling RIDs.

Fixes #74732

Discussion

While #74732 may seem like an innocuous issue, it is the symptom of a major safety problem in the 3.x codebase, and so some care should be taken over the solution, rather than attempting a quick fix.

Keeping a RID longterm in the scene side code is a very dangerous pattern (outside of outright ownership). The reason is that RID (in regular, non RID handles builds) is a glorified pointer, and deleting the RID object from elsewhere can lead to a dangling RID, which, like a dangling pointer, can lead to crashes.

The physics was using this dangerous pattern to keep a RID to on_floor_body, used in the move_and_slide() function. Deletion of on_floor_body leads to a crash the next time move_and_slide() is called.

How to solve

There are number of ways of addressing this problem, some of the main ones I considered:

  1. Maintain an indirect (instead of direct) reference to the RID, and NULL any of these references as objects are freed, to prevent dangling references. (This is the approach in this PR)
  2. Maintain a two way reference to the RID object, so that the object in the server knows which objects maintain a reference, so these can be cleared up on free. There are a number of problems with this approach as it crosses server boundaries.
    An example is a pending delete in the message queue to the server that will be executed before our call using the RID, but after we make this call (because of the order in the queue).
  3. Total safety - change to using the RID handles build as standard, and allow detection of dangling RIDs. This may worth doing at some point as it is probably the most thread safe, but it delays the response until server side, and ideally we would want to detect the dangling RID before the server functions were called. We could query the RID database client side, but it is still possible the RID could be freed from another thread between the client and server function calls (as there is a message queue between them).
  4. Store an ObjectID instead of a RID where persistent storage of an object is required. (This method appears partly used in Godot 4 for CharacterBody3D, although master still stores a RID and looks like it will crash with similar circumstances to Heap-use-after-free in move_and_slide if the floor body is freed #74732.)

This PR

This PR is quite a simple solution - it is a move towards a paradigm of disallowing client code from retaining direct references to RIDs (outside outright ownership) and instead holding RIDRefs, which are indirect, and can be NULLed as objects are deleted.

This works for the linked issue but there is a caveat - it assumes a single threaded pattern. If another thread were to jump in and free a RID between the check for validity and the call to the server, then you could still theoretically get a dangling RID.

However, in 3.x at least, this client physics code I am hoping will be called from the same thread, so this race condition shouldn't happen.

This is something I'll need to double check with @reduz and confirm whether this is the best approach to this class of problem (at least in Godot 3.x).

Whatever approach we use here can probably be extended easily to cover the whole class of these bugs. For instance if we have a look at all the places RIDs are stored scene side, aside from ownership, we can convert these to use RIDRefs for extra safety. At the moment I've just added SERVER_PHYSICS but it's trivial to support the other servers by adding to the enum.

Runtime performance

There is also a small cost, when deleting RIDs, it needs to run through the list of references to NULL out any dangling references. This is made more efficient by separating these lists per server rather than having them all lumped together. If a physics RID is being deleted, there's no need to search through e.g. Render RID references.

The lists are fairly simple to start with, so there is the possibility of being slow in extreme situations like benchmarks with thousands of references and lots of object "churn". If necessary things can be optimized further (with e.g. hash table for looking up the RIDs on deletion), but it is usually better to use a simple solution to start with, unless profiling shows more complexity is needed. (Also I want to discuss whether this is the best way to go before proceeding further.)

@lawnjelly lawnjelly requested a review from a team as a code owner May 29, 2023 14:35
@lawnjelly lawnjelly added this to the 3.6 milestone May 29, 2023
Storing RIDs scene side can lead to dangling RIDs if a RID is freed elsewhere.

RIDReference allows scene side code to reference a RID indirectly, which will automatically be set to NULL when the object is freed, and thus help prevent dangling RIDs.
@lawnjelly lawnjelly requested a review from reduz May 29, 2023 14:51
@akien-mga akien-mga changed the title Add RIDReferences [3.x] Add RIDReferences Jun 7, 2023
@lawnjelly
Copy link
Member Author

Closing as we have merged #88946, which is a simpler way of fixing bug.

This PR is still an alternative should we get performance regressions from the above.

@lawnjelly
Copy link
Member Author

Due to the issue in #97315, it may be worth resurrecting this approach.

It turns out some of the RIDs stored in the physics are to non Object derived objects, therefore they are not in the ObjectDB, therefore querying the ObjectDB is not possible to determine their lifetime.

Replacing ObjectDB checks with RIDReferences may be a good solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants