-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Leaks memory when input is not a numpy array #201
Comments
Is there a memory leak if instead of pd.Series(x) you use x.tolist()? |
Yes. In fact, it leaks twice as much memory for some reason. |
The memory leak is a big find. Thank you! |
Yeah, I'm very happy I finally worked out why my Jupyter sessions have required a weekly restart for the last 6 months :-) |
@batterseapower could you try the leak_201 branch? In it I think I fixed the memory leak (but only for reduce functions). Your code doesn't run for me (maybe because I am on py2.7?). If you could check both for numpy array and non-numpy arrays that would be great. My quick checks (watching htop) looked good. |
@shoyer thanks for the suggestion. Did I implement it correctly? This change will touch every function so I am wondering if I should make it right before a release. A second set of eyes will give me confidence. |
You might not be able to run the code if you aren't on Windows: I tried the sample on my Mac and it looks like the Anyway, your fix seems to have worked: neither Pandas nor numpy objects leak (and neither do Python lists from |
@shoyer what if the input is a numpy array and an error occurs after the @batterseapower thanks for the checks. I'm on linux so will try the |
OK, so here is the proposed fix for the memory leak: https://github.com/kwgoodman/bottleneck/compare/leak_201 Comments welcome. If all looks good then I will apply the fix to the other functions (nonreduce, moving window, etc) |
@kwgoodman could you kindly open a pull request so I can comment inline? |
Yes, you should call A common style you'll see in NumPy is to use PyObject *result = NULL;
PyObject *x;
x = something_else();
if (x == NULL) {
result = NULL;
goto cleanup;
}
result = other_stuff(x);
cleanup:
Py_XDECREF(x);
return result; |
I had forgotten about Thank you both for the review. I'll make the changes to the rest of the functions. If anyone thinks I shouldn't include these changes right before a release, let me know. |
I would suggest opening the pull request from your branch first. Then I can review the changes for one function before you do it for everything :) |
OK, I merged the memory leak fix into master. |
Sorry, it's actually fine.. just if someone stumbles over this again I add this here: |
If you run the following program you see that
nansum
leaks all the memory it are given when passed a Pandas object. If it is passed the ndarray underlying the Pandas object instead then there is no leak:This affects not just
nansum
, but apparently all the reduction functions (with or withoutaxis
specified), and at least some other functions likemove_max
.I'm not completely sure why this happens, but maybe it's because
PyArray_FROM_O
is allocating a new array in this case, and the ref count of that is not being decremented by anyone? https://github.com/kwgoodman/bottleneck/blob/master/bottleneck/src/reduce_template.c#L1237I'm using Bottleneck 1.2.1 with Pandas 0.23.1.
sys.version
is3.6.1 (v3.6.1:69c0db5, Mar 21 2017, 18:41:36) [MSC v.1900 64 bit (AMD64)]
.The text was updated successfully, but these errors were encountered: