-
-
Notifications
You must be signed in to change notification settings - Fork 719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Series.values.compute() leads to "TypeError: can't concat buffer to bytearray" #1179
Comments
I noticed that I can compute |
To be clear, Executor is just an alias to Client
This runs fine for me. My first guess is that you have a version mismatch. Can you verify that the following does not err:
|
That's the output I get:
The installation should be up-to-date versions of both dask + distributed. Client + scheduler + worker are all running from the same venv on the same host. |
I tried reproducing this with an environment like the following and unfortunately everything worked.
Any recommendations on what to check that might differ between our environments? |
Maybe it's related to tornado, since that's where the error is thrown? My tornado version is |
Same |
There was also a micro-release shortly after 1.17.1. This resolved some things with moving around memoryviews. I don't think that this is likely to affect you but you might try updating and see if that has an effect. |
Updated to 1.17.1 but the worker still throws the exception with slightly modified behavior: I did get the computation result back in the client on the first attempt although the worker raised the exception. Repeating the command a second time hangs in the client. I feel that the issue manifests itself in a somewhat erratic way. Right now I have no idea what could be wrong... |
Lets wait for @pitrou to take a look at the error. He is more familiar with the networking stack than I am (I think he wrote the bytearray code in tornado) and may have thoughts. Unfortunately I think he's on vacation until Monday. |
Adding a few more observations: Currently the
But maybe that is not the right place to fix the issue and probably we still want to understand why the problem isn't easily reproducible. Update: Maybe this isn't really a fix. It doesn't crash any more, but now my computation results differ from running with the single machine schedulers. |
If I recall correctly we recently started allowing |
Ah, I think I found what is causing the issue: It might be related to allocators. I'm encountering the issue in a setting where the driver is using jemalloc while the scheduler/worker are using glibc. If I use glibc everywhere the issue disappears. |
Same as in the other issue: I cannot reproduce the failures here with a newer jemalloc version / on OSX. I rather suspect that there is some usage of unitialised memory. This will probably not be a bug in |
Dang, I'm getting the |
@bluenote10, which exact Python version are you using? |
Oops, I see, it should be |
Unfortunately, the issue you're having (concatenating a |
For the record:
|
So you are saying that from a networking perspective it does make sense that the Because if it is a valid value it should be easy to fix this for Python 2.7.3 by properly converting to e.g. It is not so much about fixing the Python version locally on my machine. We still have a need to run on Debian 7 servers as well, which are also Python 2.7.3. |
It can make sense, yes. It means in some circumstance we were able to avoid a memory copy and instead passed a view of some existing memory area (most likely the data of a Numpy array). Why this only seems to happen sporadically for you I'm not sure, though. |
It is initially easy, but then needs to be maintained. We are unlikely to run any CI builds with Python 2.7.3, so it may get broken unexpectedly again. Besides, without wanting to spread FUD, there are many other issues that were fixed in recent 2.7 releases and thay may pop up with 2.7.3 (basically anything that is above this line in Therefore I'm reluctant to add workarounds for such an old Python version, except if it's part of a commercial support contract. |
What is the status of this issue? Should it be closed? |
The issue still exists, but we can close it for now. Possible work-arounds are not using jemalloc with Python 2.7.3 or using the patch posted above. If need be we can think about a PR either for Dask or maybe Tornado. |
Running a local
dask-scheduler
+dask-worker
pair, the following code leads to a crash of the worker:The worker crashes with:
Other computations like
ddf["A"].compute()
or evenddf.values.compute()
work fine though.The text was updated successfully, but these errors were encountered: