-
Notifications
You must be signed in to change notification settings - Fork 208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix race condition involving wrapper lookup #865
base: master
Are you sure you want to change the base?
Conversation
} | ||
|
||
static bool nb_try_inc_ref(PyObject *obj) noexcept { | ||
#if 0 && defined(Py_GIL_DISABLED) && PY_VERSION_HEX >= 0x030E00A5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the 0
intentional here?
@@ -1766,16 +1834,18 @@ PyObject *nb_type_put(const std::type_info *cpp_type, | |||
PyTypeObject *tp = Py_TYPE(seq.inst); | |||
|
|||
if (nb_type_data(tp)->type == cpp_type) { | |||
Py_INCREF(seq.inst); | |||
return seq.inst; | |||
if (nb_try_inc_ref(seq.inst)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that if nb_try_inc_ref
fails, what we actually want to do is to break
out of the loop that considers the different Python objects associated with that pointer address. We tried the only possible match, and it cannot be returned.
Ditto below.
There are also two other places, where this nb_try_inc_ref
will be needed, which is in nb_type_put_p
below (mirrors the function here, but for polymorphic C++ objects)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think breaking out here would create more wrappers than necessary. The case I'm thinking about is where there are multiple lookups before the to-be-deallocated wrapper is removed from inst_c2p
:
T1: starts to deallocate wrapper object, but hasn't removed it from inst_c2p
yet.
T2: lookups wrapper, sees zero refcount. The entry is converted to a linked list and the new wrapper is inserted at the end.
...
T3: lookups wrapper. The first entry still the zero refcount wrapper. The second entry is the wrapper we want. If we break out early, we'll create a third wrapper, which doesn't seem ideal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In nanobind (and pybind11, for that matter), a single pointer can be associated with multiple Python instances.
For example, you could have
struct A {
B b;
/* ... */
}
with both an A*
pointer and a B*
pointer being simultaneously alive through Python wrappers. These can be created and GCed independently from each other.
The nb_type_put()
function converts a C++ pointer into an associated Python object (if one exists) or it creates a new one. It walks through a chain of objects that are associated with a given pointer address until it finds one of the right type. There can only be one of the right type.
Your commit adds a failure mechanism, where it turns out that one wrapper is in the process of being deleted, and we need to create a new one. In this case, it (AFAIK) doesn't make sense to try the other types associated with the current pointer, since we already found the one. Hence the suggestion to break
and immediately create a new wrapper. Arguably this is pretty minor and unlikely to have a big performance impact.
src/nb_type.cpp
Outdated
#if 0 && defined(Py_GIL_DISABLED) && PY_VERSION_HEX >= 0x030E00A5 | ||
PyUnstable_EnableTryIncRef(obj); | ||
#elif defined(Py_GIL_DISABLED) | ||
// TODO: Replace with PyUnstable_Object_EnableTryIncRef when available. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the difference between PyUnstable_Object_EnableTryIncRef
and PyUnstable_EnableTryIncRef
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No difference, just the name changed. I'll update the comment.
@@ -40,6 +40,72 @@ static PyObject **nb_weaklist_ptr(PyObject *self) { | |||
return weaklistoffset ? (PyObject **) ((uint8_t *) self + weaklistoffset) : nullptr; | |||
} | |||
|
|||
static void nb_enable_try_inc_ref(PyObject *obj) noexcept { | |||
#if 0 && defined(Py_GIL_DISABLED) && PY_VERSION_HEX >= 0x030E00A5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the 0 intentional here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The APIs haven't been added to CPython yet, but I wanted to get this PR ready while the issue was still fresh in my mind. We can either:
- Delete this code path for now
- Wait until the APIs are added
- Or anything else you prefer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's OK to commit the code as is, we can remove the #if 0
part once the change to CPython goes in.
Do I understand correctly that the version below now has an optimization (specific to object construction) that won't be in the official API?
Hi Sam, thank you for tracking this down and making a patch. I left some comments inside. This added requirement to invoke |
I'll measure the overhead |
I confirmed that this PR fixes a race that @vfdev-5 and I were trying to track down in the JAX test suite when run under free-threading (reduced example: https://gist.github.com/vfdev-5/d46bcdc73231bd1e2b2f85c40de9f890) |
I benchmarked some simple wrapping code using the current struct Foo {};
nb::class_<Foo>(m, "Foo");
m.def("new_foo", []() -> Foo * { return new Foo(); }); And then calling On my M1 mac, the difference is basically in the noise margin: each iteration takes ~80 ns. The extra overhead seems to be about maybe 1-2 ns, but there's similar variation between runs. We can optimize |
We only call `nb_enable_try_inc_ref` during object construction when we have the only reference to the object. We can safely overwrite `ob_ref_shared` without using an atomic compare-exchange. This speeds up creating Python object wrappers by ~5 ns on my Intel machine. It doesn't seem to matter as much on Apple's M1 chip.
Thank you @colesbury, I left 2 more comments. By the way, in |
See #864