Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Object: Let debug lock handle callee destruction within call chain gracefully #96856

Merged
merged 2 commits into from
Sep 16, 2024

Conversation

RandomShaper
Copy link
Member

Fixes #75658.

@RandomShaper RandomShaper added bug topic:core cherrypick:4.3 Considered for cherry-picking into a future 4.3.x release labels Sep 11, 2024
@RandomShaper RandomShaper added this to the 4.4 milestone Sep 11, 2024
@RandomShaper RandomShaper requested a review from a team as a code owner September 11, 2024 12:55
@RandomShaper RandomShaper force-pushed the selfdestruct_correctness branch from b0ec68a to 05e3c71 Compare September 11, 2024 12:58
@akien-mga akien-mga added cherrypick:3.x Considered for cherry-picking into a future 3.x release cherrypick:3.6 Considered for cherry-picking into a future 3.6.x release labels Sep 11, 2024
@AThousandShips
Copy link
Member

@RandomShaper
Copy link
Member Author

How does this relate to:

And this fix:

Wow, I had totally forgotten I was involved there.

Now I understand the issue better, I can say that GDScript already ensures the lifetime of a RefCounted base object, because it stores it on the stack, so it will survive at least up to the end of the call. Therefore, I'd say there's no need for the extra temporaries added by the compiler in #86969.

@RandomShaper
Copy link
Member Author

Of course, using Variant has the side effect of deferring the destruction of the object until after the call, so I've modified the approach in a way that the refcount is not artificially incremented, while still having safety if the object is released prior to returning.

@RandomShaper RandomShaper force-pushed the selfdestruct_correctness branch from 05e3c71 to b70bbba Compare September 11, 2024 13:29
@lawnjelly
Copy link
Member

lawnjelly commented Sep 11, 2024

This looks pretty good and identical to my approach in the issue (great minds think alike and all that 😁 ), so am almost there to approve.

The only thing I wasn't sure about is how often this gets called in practice with every call to an object, and how expensive the ObjectDB lookup is (hashtable?) and whether we should be looking to avoid it, even in debug. i.e. does it appear on a profile. I suspect not, but would be nice to check.

An alternative would be to optionally store e.g. a pointer to a bool on each Object, which can inform the calling code when an object has been destroyed. This is O(1) but has the disadvantage of storing an extra pointer on each object.

If the Object::call() isn't used that often it isn't worth micro-optimizing though.

UPDATE:
Actually nearly every alternative I can think up either has potential bugs (due to e.g. the chain of calls containing more than one reference to the object), or may not be O(1) themselves, so maybe we should just go for this and work further if it introduces any performance issues.

@akien-mga akien-mga removed cherrypick:3.x Considered for cherry-picking into a future 3.x release cherrypick:3.6 Considered for cherry-picking into a future 3.6.x release labels Sep 11, 2024
Copy link
Member

@AThousandShips AThousandShips left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense and looks good to me

@AThousandShips
Copy link
Member

Would be good to add some tests if we can reliably reproduce these issues

@RandomShaper RandomShaper force-pushed the selfdestruct_correctness branch from b70bbba to 10e2318 Compare September 12, 2024 06:51
@RandomShaper
Copy link
Member Author

Regarding performance, I wouldn't worry much since ObjectDB::get_instance() is very fast, and O(1), as it's based on generational ids. On the other hand, I think there's an opportunity to revamp this debug lock feature and maybe make it work only if the debugger is attached (like certain Variant object validity checks do) and/or make a very different implementation. After all, the only goal of the debug lock is prevent free() from being called if the object is at that point involved in a call chain.

I've force-pushed with an added likely(), and also (due to the striking coincidence) including @lawnjelly as a co-author (that's the least a gentleman can do 😀).

I'm adding some tests and will push again.

@lawnjelly
Copy link
Member

I've force-pushed with an added likely(), and also (due to the striking coincidence) including @lawnjelly as a co-author (that's the least a gentleman can do 😀).

Ah yes I'll do that on mine too. 👍

@RandomShaper RandomShaper requested a review from a team as a code owner September 12, 2024 08:52
@RandomShaper
Copy link
Member Author

Pushed some tests, as "black-boxey" as possible. There are other cases of OBJ_DEBUG_LOCK used in object.cpp (emit_signalp for the emitting object and set_script()), but in those it wouldn't be safe that the object destroyed itself.

@lawnjelly
Copy link
Member

I don't know the first thing about tests, but given the MRP only crashes with asan and debug build, does the test fail with the old code (with the normal tests build)?

@RandomShaper
Copy link
Member Author

I don't know the first thing about tests, but given the MRP only crashes with asan and debug build, does the test fail with the old code (with the normal tests build)?

It will only fail with asan enabled, but luckily CI runs jobs that compile with sanitizers and runs tests on such builds.

@RandomShaper RandomShaper force-pushed the selfdestruct_correctness branch from 2736124 to a9ff30d Compare September 12, 2024 10:58
@RandomShaper
Copy link
Member Author

I've been able to come up with a way of letting a crash happen despite asan not enabled. Not as guaranteed, but makes the test crash with the prior implementation of _ObjectDebugLock.

@RandomShaper RandomShaper force-pushed the selfdestruct_correctness branch from a9ff30d to 271729d Compare September 12, 2024 11:01
@RandomShaper RandomShaper force-pushed the selfdestruct_correctness branch 6 times, most recently from 3218a35 to e4e0087 Compare September 13, 2024 08:19
@RandomShaper RandomShaper requested a review from a team as a code owner September 13, 2024 08:19
@RandomShaper
Copy link
Member Author

Wow, getting the non-asan/tsan version of the crash induction to work on every platform has proven challenging.

Also, I've added tests for GDScript as its script instance and VM implementation have to guarantee the behavior. Those will only fail with asan/tsan, though.

I'm fairly happy with this now.

@RandomShaper RandomShaper force-pushed the selfdestruct_correctness branch from e4e0087 to bb77520 Compare September 16, 2024 07:58
@akien-mga akien-mga merged commit f7daa0f into godotengine:master Sep 16, 2024
20 checks passed
@akien-mga
Copy link
Member

Thanks!

@akien-mga
Copy link
Member

Cherry-picked for 4.3.1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Dereferencing a funcref from the function it calls leads to a heap-use-after-free
4 participants