-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make Distributed.Worker threadsafe #37905
Conversation
e774469
to
c655786
Compare
Whoops, I lost a commit during rebase. While fixing that, I dropped the commit mentioned above. Sorry if I started two CI runs! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test failure looks relevant:
From worker 2: ArgumentError: Active workers with lazy=false. Cannot set lazy=true
Thanks for looking into the logs, @vchuravy! Should be fixed now. As the tests remove all the workers they added, I just moved them above the topology tests. |
I have no idea why the new tests are that flaky. |
Edit: I suspected an error in locking the socket during the handshake, but I can't tell for sure. On Linux it fails in libuv
|
I have no idea how to fix this. Even though the failure is clearly caused by the new tests, the reason seems unrelated to the changes proposed in this PR. Should I document the failed builds in a separate issue and move the tests to be a comment? |
Can you rebase onto master one more time? |
so that it is guaranteed that lazy workers can be added. Also, the tests make sure to remove all workers that had been added, so that subsequent tests can determine the laziness of the cluster themselves.
680833c
to
c17316b
Compare
@vchuravy some tests are still failing, but it looks like for different reasons. This time they failed to connect to the workers (within the Distributed test suite). |
Thank you Jonas! |
I don't know how to test this more specifically, but the included tests failed reliably (on my machine) prior to the fix. If you have suggestions, I am happy to adjust/replace the tests.
The included commit by Kristoffer is an artifact I needed for development. I forgot to replace the calls tonotify
, which caused the build to hung up on creating some precompile statements (without printing any errors). Loading the changes to the stdlib using Revise didn't work on my machine, I don't know why. I can rebase and drop that commit if you like.Fixes JuliaLang/Distributed.jl#73