Memory leak when reading a stream via callback #98

chrippa · 2012-10-31T14:22:39Z

When using sh to read a network stream I noticed a memory leak. This is a simple test case:

from sh import curl
from time import sleep

url = "ftp://ftp.port80.se/1000M"

def read_callback(data):
    print("data", len(data))

curl(url, _out=read_callback)

while True:
    sleep(1)

When you run this you can look at top and see that the data is never freed even though there is no reference keeping it around.

This has been tested with:
sh 1.05 and git
Python 2.7 and 3.2

The text was updated successfully, but these errors were encountered:

amoffat · 2012-10-31T20:33:14Z

I was able to confirm this. Really strange...even wrapping the code in a function (everything before the while True), the memory still grows. There is a fork-exec happening internally, maybe this is somehow related..

amoffat · 2012-11-08T04:31:55Z

Just an update, I'm still looking into this. There's some really weird behavior going on where garbage collection isn't occurring. Having a hard nailing down exactly where the references are being held.

amoffat · 2012-11-10T08:30:14Z

Fixed on master and pushed to pypi v1.06. The cause was some nasty cyclical references that were preventing garbage collection

chrippa · 2012-11-13T21:47:11Z

Hmm, I'm still able to reproduce this with the test case I posted. I double checked on two different installs and also checked that sh.version is 1.06, just to make sure I didn't run the old version. I tested with Python 2.6, 2.7 and 3.3.

Here is a screenshot of htop: http://i.imgur.com/p6L1x.png

amoffat · 2012-11-13T21:54:01Z

@chrippa What I found with the fix that I added was that python's garbage collector is really really lazy. I would have to run gc.collect() to get the objects to collect in a timely manner. Give this a shot and let me know if you see a change in the memory usage.

When I get home tonight, I'll post a test case I was using...it was similar to yours. A while loop, sh.cat(largefile) over and over. Before the fix, memory grew indefinitely, after the fix, it stayed constant. Maybe your test case is different enough though (because it is using a callback) that there is still uncollected garbage...I'll need to confirm this

chrippa · 2012-11-13T22:23:54Z

I tried adding gc.collect() to the while loop and also in the callback of the test case but it did not make any difference.

I'm using sh in my project to read a live video stream from a subprocess (rtmpdump) so I need to read chunks of data and not all data at once like with your cat example. I used to read directly from the Popen stdout object in pbs, but if I understand correctly I need to use callbacks to do the same in sh.

amoffat · 2012-11-13T23:35:27Z

Gotcha. So one thing that may be misleading is that sh commands do buffer all the data internally. So while your callback is being called with each chunk, all the chunks are getting aggregated internally. So if you do something like this

process = curl(url, _out=read_callback)
print(process.stdout)

It would print all of the chunks concatenated together. So I might be misunderstanding what you are looking for... are you saying that the entire process object (in the above example) is not being garbage collected when it goes out of scope? Or are you saying that the memory should not be growing as your callback is being called?

If it's the second one, we can probably disable stdout getting aggregated internally if a callback is being used. But I want to be sure that the issue you're seeing isn't the garbage collection issue (of the process object not being collected and resources being freed if you call del on it).

chrippa · 2012-11-13T23:41:18Z

Ah, this makes more sense now. I was expecting once the callback was called the data would be gone with it. A way to disable the aggregating would solve the problem.

amoffat · 2012-11-14T07:20:34Z

I have a fix on the dev branch right now, if you could, go ahead and download it and drop it into your PYTHONPATH to test it https://raw.github.com/amoffat/sh/dev/sh.py. The new special keyword arguments you'll want to use are _no_out, and _no_pipe. What these do is explicitly disable the aggregating of those internal buffers:

def read_callback(data):
    print("data", len(data))

curl(url, _out=read_callback, _no_out=True, _no_pipe=True)

I'm wondering thought if we should automatically disable aggregating if a callback is used, since a callback will probably only be used to process a large amount of data, and you wouldn't want to automatically store a large amount of data (which is your use case)

Anyways, when you get a chance, confirm for me that the dev file works for you, and I can roll that up for the 1.07 release.

chrippa · 2012-11-14T09:32:29Z

Thanks for the fix.It's working fine here!

amoffat · 2012-11-21T23:22:42Z

Fixes on master and v1.07

amoffat mentioned this issue Nov 5, 2012

Memory leak using sh #99

Closed

amoffat closed this as completed Nov 10, 2012

amoffat reopened this Nov 13, 2012

amoffat closed this as completed Nov 21, 2012

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory leak when reading a stream via callback #98

Memory leak when reading a stream via callback #98

chrippa commented Oct 31, 2012

amoffat commented Oct 31, 2012

amoffat commented Nov 8, 2012

amoffat commented Nov 10, 2012

chrippa commented Nov 13, 2012

amoffat commented Nov 13, 2012

chrippa commented Nov 13, 2012

amoffat commented Nov 13, 2012

chrippa commented Nov 13, 2012

amoffat commented Nov 14, 2012

chrippa commented Nov 14, 2012

amoffat commented Nov 21, 2012

Memory leak when reading a stream via callback #98

Memory leak when reading a stream via callback #98

Comments

chrippa commented Oct 31, 2012

amoffat commented Oct 31, 2012

amoffat commented Nov 8, 2012

amoffat commented Nov 10, 2012

chrippa commented Nov 13, 2012

amoffat commented Nov 13, 2012

chrippa commented Nov 13, 2012

amoffat commented Nov 13, 2012

chrippa commented Nov 13, 2012

amoffat commented Nov 14, 2012

chrippa commented Nov 14, 2012

amoffat commented Nov 21, 2012