-
Notifications
You must be signed in to change notification settings - Fork 782
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove deferred reference count increments and make the global reference pool optional #4095
Conversation
43d791e
to
fcc7253
Compare
CodSpeed Performance ReportMerging #4095 will improve performances by 10.33%Comparing Summary
Benchmarks breakdown
|
Why print vs. panic in drop? |
It is quite hard to avoid dropping without the GIL, especially in embedding situations. And in contrast to |
That said, I would also be fine with panicking in |
I quite like the idea of this feature. I wonder, maybe instead of leaking, without the pool enabled the drop & clone code can just unconditionally call That way the program is correct whether or not the feature is enabled, but users get to control this knob according to the performance characteristics of their program. |
(and hopefully with nogil in the long term the reference pool feature might just then become deprecated for better options) |
Isn't this much too prone to deadlocks? |
Ungh, yes that's true. With nogil what I suggest would be viable. 😭 |
Could we provide the |
The problem I see is that you might want to still clone |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think it's extremely handy to be able to #[derive(Clone)]
in particular, which would require Py<T>: Clone
.
Also the direction seems to be that panic-on-drop is not allowed (e.g. rust-lang/rfcs#3288) so +1 for not panicking. But I would be open to aborting instead of printing :)
Pending a decision on what to do with Drop I think the idea here gets a 👍 from me as a lever for users to pull when they know their application better than us (i.e. it never clones or drops Python values without the GIL held).
We should document this in the features and performance sections of the guide before merging, though.
fcc7253
to
67d8a8d
Compare
Added a section in the performance chapter of the guide. @inducer @matthiasdiener As this was originally reported by you, what do you think about the feature and its explanation in the guide? |
d7dbf11
to
698b312
Compare
Hmm, I can definitely see that point, but in that case I feel like having this as opt-out of "safe" behavior, rather than opt-in to "i-know-what-im-doing" is not ideal, since this has "spooky" side effects. Starting with |
I share that sentiment which is why I included
in the cover letter. Do note one downside of this though: Rust's features are additive so there is a different kind of spooky side effect. One crate in dependency graph of an extension can enable affecting everything else. It would be bad style to enable it in a non-leaf crate, but it is possible. So if we really want to make this hard to enable accidentally as possible, we should probably go for a raw But I think we should first hear from the original reporters whether they actually want this. Just dropping the PR because it does not really help them is also an option. |
Following up from #4105 (comment) It seems that we may need to migrate to a strategy where For Maybe we can move the overhead away from all PyO3 functions, and instead if there is a deferred drop we signal up a worker thread to wake, which can attempt to acquire the GIL and drop all currently pending deferred drops? Moving the overheads off every function call may be a good enough solution until GIL-less Python can give us an alternative option. |
(I know I already suggested this reference-counting-thread idea in #3827 (comment) and @adamreichold correctly pointed out that deferring the work is dangerous. With what we are now seeing particularly about deferred |
But isn't this mainly gaming the benchmarks? Our call overhead will reduce, but the actual work to be done will increase (at least there is an additional acquisition of the GIL, and on another thread to which all these objects may be cache-cold). This could improve throughput at the cost of increase CPU usage, but only if the other threads actually release the GIL for reasonable amounts of time. And if that worker thread never actually gets the GIL, we could see unbounded memory growth. While deferred drop is certainly not as dangerous a deferred clone, I fear we are inventing a garbage collector here. And if do decide to do that, I think we should go for established solutions, e.g. epoch-based reclamation or hazard pointers which at least have some support crates available. Personally, I don't think we should make this work. This is adding completely additional semantics that Python itself does not (yet) support which will always be difficult to get right as long as Python does not cooperate in these memory management tasks. So while I could live without the |
Very true. Ok, I hereby agree to drop (heh) the reference-counting-thread idea 👍 I think there's still some open question on whether we should be trying to keep the drop pool around. I feel like for sake of compatibility we can't migrate users immediately from the existing pool to leaking on drop without GIL, as silently adding memory leaks to their program may break confidence in PyO3. The abort-on-drop is similarly unappealing for migration but is at least a noisy failure mode. Whether to leak / abort, some prior art is that Given the concern about testing for leaks being hard, I'd be tempted to suggest that we add this as a I think we are now on a course to release 0.22 ASAP, given that there are various behavioural changes afoot here. So I propose the following for 0.22:
|
698b312
to
f9d8ff3
Compare
I added a new commit here which removes delayed refcnt increments and adds said feature.
Added another commit which renames the feature proposed here.
I would be glad if we did. Note that the tests currently pass only with |
f288495
to
29f89de
Compare
👍 I will do my best to find time to give this a full review tomorrow. Am pretty loaded with family responsibility at the moment so my opportunities to focus are a bit rare this month. |
992e295
to
6d46508
Compare
Sorry for the delay, I got delayed by a pile of family responsibility in April which left me quite burned out. I think a conditional compile flag is good and adding a second one for the downgrade of aborts to leaks seems ok to me too. I somewhat wonder if we should split the |
So explicitly, this means we do not want to tie this to
There are somewhat functionally intertwined, so I would glad if I could instead mitigate this by updating the cover letter here to retrace the steps and decisions. |
…viour becomes opt-in
Hmmm. I think in my mind leaking is the worst case (as it implies GC / memory overhead for the rest of program runtime, which may be hard for users to notice). So I'd prefer abort by default and the second flag changes from aborting to leaking. If we do move ahead with #4174 then having just one flag to get to aborting seems slightly better too.
👍 Yep that's reasonable to me! |
44276ee
to
d7901a1
Compare
There are now two cfg flags:
I have updated the cover letter. |
d7901a1
to
fb37270
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me, thanks! 🎉
|
…nce count errors as long as these are available
…ble-reference-pool features
… conditional compilation flag Such a flag is harder to use and thereby also harder to abuse. This seems appropriate as this is purely a performance-oriented change which show only be enabled by leaf crates and brings with it additional highly implicit sources of process aborts.
…s when the global reference pool is disabled and the GIL is not held
fb37270
to
04f0f05
Compare
While this started out as an optimization to fully avoid the overhead of the global reference pool, #4105 lead us to the conclusion that deferring reference count increments is fundamentally unsound and so has to be removed from the global reference pool in any configuration. As a consequence, the pool will be limited to handling reference count decrements and the
Clone
implementation for typesPy
will panic if called without the GIL being held.Since the new behaviour of
Clone
is significant hazard to existing code but unavoidable for a sound API, we decided to hide it behind the newpy-clone
feature which needs to be explicitly enabled thereby giving downstream projects a chance to inspect their usage ofClone
before committing to the new possibilities of panics.Closes #4105
I don't think leaking references inDrop
is nice because it is quite easy for this to be called without the GIL being held, especially in embedding scenarios. But it should be safe nevertheless. It might call for an inverted feature to explicitly opt out of the reference pool by enabling e.g.disable-reference-pool
.