-
-
Notifications
You must be signed in to change notification settings - Fork 719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Distributed hello world fails when using jemalloc #1190
Comments
@jreback no, they don't seem to be related here. Sadly I cannot reproduce this problem on my machines here (neither OSX nor Debian 7+). It seems that there is some relation to the old Ubuntu 12.04 image @bluenote10 is running on which ships with jemalloc 2.2.x and glibc 2.13. |
I cannot reproduce on Ubuntu 16.04 either. In any case, this doesn't seem to be a Distributed bug, so I'm inclined to close this issue unless you have significant reason to believe Distributed is involved. |
I'm not sure. Similar to the other issue it seems to affect the networking. Maybe you can make more sense of the full traceback:
|
The fact that networking may be affected doesn't imply that Distributed is the culprit. Distributed is pure Python code and does not depend in any particular way on the underlying C I might add: why are you using jemalloc? Did you get specific performance improvements using it? |
Result of valgrind leading to the traceback
|
I had less issues resulting from memory fragmentation with jemalloc, but I should be able to use glibc as well. |
@pitrou The arrow issues were solely a build environment issue that surfaced while loading a library. Once the library is successfully loaded, this glibc bug is not triggered anymore. We're using jemalloc in general (note: I work at the same place as @bluenote10 ;) ) as it has less memory fragmentation as glibc and has a better multithreaded performance. In Apache Arrow (different use case then in this issue here), we also use it as it can provide aligned memory (re-)allocation, this enables us to use faster numeric CPU instructions. |
Ok. Still, I don't know what to do with this issue. Using a different memory allocator shouldn't mess network communications implemented in pure Python, unless there's something seriously wrong in low-level routines (I mean routines implemented in C either inside Python itself or inside system libraries and/or third-party C libraries). As for the Valgrind output, the following seems fishy. But a Valgrind or glibc expert would have to dig in:
|
Sure we can close it, if there is nothing that can be done. Just wanted to keep you posted on the issue. |
Observation made in relation to #1179.
Running a local distributed executor like in the "hello world" example crashes if the allocator is jemalloc.
Starting an IPython shell via
LD_PRELOAD=/usr/lib/libjemalloc.so.1 ipython
and running the following codecrashes with
ValueError: Worker not started
The code works fine using glibc.
The text was updated successfully, but these errors were encountered: