-
-
Notifications
You must be signed in to change notification settings - Fork 723
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace AsyncProcess
exit handler by weakref.finalize
#4184
Replace AsyncProcess
exit handler by weakref.finalize
#4184
Conversation
This does resolve a long standing cleanup issue for UCX and we think this is a cleaner approach to halt workers. @jcrist if you have time can you look this over ? |
@fjetter your thoughts here would also be helpful |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few quick comments. No thoughts on whether this is the right approach over other methods - generally I'd hope we could explicitly manage closing these processes rather than requiring a finalizer to implicitly handle them.
I think the main problem we have today is that using exit handlers as globals as is the case in Distributed seems incorrect, as the order will not be respected. I haven't looked at all exit handlers but the few I looked at will all have the same problem, a couple examples: distributed/distributed/client.py Lines 4838 to 4850 in 18fff8b
distributed/distributed/deploy/spec.py Lines 636 to 641 in 18fff8b
I don't know whether this is the right approach either, I was hoping to suggest this and begin a discussion on potential pitfalls and alternatives. At the moment I don't see a much better approach, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Provided tests pass, this looks good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @jcrist and would prefer a world where we would not need to cleanup with finalizers but instead would be able to close everything intentionally.
However, as far as finalizers go I think this approach is slightly better than the one before since it deals with each object/process individually instead of in bulk. In particular, as already mentioned, it respects the ordering in the order the objects where created. Before the order was determined on module import time. Not sure if this makes any difference but this approach seems also simpler to reason about than the module level finalizer.
There are a lot of failures on travis/py3.6. Most seem unrelated but I'm not entirely sure. |
I'm also not sure whether the 3.6 failures are related to this, it seems that they aren't, but happy to look more into that if there's some suspicion this PR is the cause. |
I restarted the 3.6 CI job |
Seems like there's still one test failing with a timeout: |
The |
Thanks @fjetter , I hadn't seen that. In this case, I really think this PR seems safe. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @pentschev! (and @jcrist @fjetter for reviewing)
(cherry picked from commit 8612473)
Thanks everyone for reviews and merging! :) |
This is an attempt at solving #4181 . I believe this is a more reliable approach, given we can ensure the process will not get destroyed before it's garbage collected.