-
-
Notifications
You must be signed in to change notification settings - Fork 505
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory leak when reading a stream via callback #98
Comments
I was able to confirm this. Really strange...even wrapping the code in a function (everything before the |
Just an update, I'm still looking into this. There's some really weird behavior going on where garbage collection isn't occurring. Having a hard nailing down exactly where the references are being held. |
Fixed on master and pushed to pypi v1.06. The cause was some nasty cyclical references that were preventing garbage collection |
Hmm, I'm still able to reproduce this with the test case I posted. I double checked on two different installs and also checked that sh.version is 1.06, just to make sure I didn't run the old version. I tested with Python 2.6, 2.7 and 3.3. Here is a screenshot of htop: http://i.imgur.com/p6L1x.png |
@chrippa What I found with the fix that I added was that python's garbage collector is really really lazy. I would have to run When I get home tonight, I'll post a test case I was using...it was similar to yours. A |
I tried adding gc.collect() to the while loop and also in the callback of the test case but it did not make any difference. I'm using sh in my project to read a live video stream from a subprocess (rtmpdump) so I need to read chunks of data and not all data at once like with your cat example. I used to read directly from the Popen stdout object in pbs, but if I understand correctly I need to use callbacks to do the same in sh. |
Gotcha. So one thing that may be misleading is that sh commands do buffer all the data internally. So while your callback is being called with each chunk, all the chunks are getting aggregated internally. So if you do something like this process = curl(url, _out=read_callback)
print(process.stdout) It would print all of the chunks concatenated together. So I might be misunderstanding what you are looking for... are you saying that the entire process object (in the above example) is not being garbage collected when it goes out of scope? Or are you saying that the memory should not be growing as your callback is being called? If it's the second one, we can probably disable stdout getting aggregated internally if a callback is being used. But I want to be sure that the issue you're seeing isn't the garbage collection issue (of the |
Ah, this makes more sense now. I was expecting once the callback was called the data would be gone with it. A way to disable the aggregating would solve the problem. |
I have a fix on the dev branch right now, if you could, go ahead and download it and drop it into your PYTHONPATH to test it https://raw.github.com/amoffat/sh/dev/sh.py. The new special keyword arguments you'll want to use are def read_callback(data):
print("data", len(data))
curl(url, _out=read_callback, _no_out=True, _no_pipe=True) I'm wondering thought if we should automatically disable aggregating if a callback is used, since a callback will probably only be used to process a large amount of data, and you wouldn't want to automatically store a large amount of data (which is your use case) Anyways, when you get a chance, confirm for me that the dev file works for you, and I can roll that up for the 1.07 release. |
Thanks for the fix.It's working fine here! |
Fixes on master and v1.07 |
When using sh to read a network stream I noticed a memory leak. This is a simple test case:
When you run this you can look at top and see that the data is never freed even though there is no reference keeping it around.
This has been tested with:
sh 1.05 and git
Python 2.7 and 3.2
The text was updated successfully, but these errors were encountered: