Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delay dropping of DoraNode in Python until all event data is freed #601

Merged
merged 2 commits into from
Jul 23, 2024

Conversation

phil-opp
Copy link
Collaborator

When dropping the DoraNode, it waits for the remaining drop tokens. This only works if all the dora events were already dropped before. With the Python GC, this is not guaranteed as some events might still be live on the heap (the user might even use them later). In such cases, we waited until we ran into a timeout, which resulted in very long exit times (see #598).

This commit fixes this issue by adding a reference-counted copy of the DoraNode and EventStream to every event given to Python. This way, we can ensure that the underlying DoraNode is only dropped after the last event reference has been freed.

Fixes #598

When dropping the `DoraNode`, it waits for the remaining drop tokens. This only works if all the dora events were already dropped before. With the Python GC, this is not guaranteed as some events might still be live on the heap (the user might even use them later). In such cases, we waited until we ran into a timeout, which resulted in very long exit times (see #598).

This commit fixes this issue by adding a reference-counted copy of the `DoraNode` and `EventStream` to every event given to Python. This way, we can ensure that the underlying `DoraNode` is only dropped after the last event reference has been freed.
@phil-opp phil-opp requested a review from haixuanTao July 21, 2024 13:49
Copy link
Collaborator

@haixuanTao haixuanTao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's excellent!

I really like that we're using the same abstraction layer for both the event reference counting and node reference counting.

Thanks a lot!

@haixuanTao haixuanTao merged commit 16ef29a into main Jul 23, 2024
38 checks passed
@haixuanTao haixuanTao deleted the delay-node-drop branch July 23, 2024 09:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Python GIL: Drop token race condition
2 participants