Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak using sh #99

Closed
hltbra opened this issue Nov 5, 2012 · 7 comments
Closed

Memory leak using sh #99

hltbra opened this issue Nov 5, 2012 · 7 comments

Comments

@hltbra
Copy link

hltbra commented Nov 5, 2012

Hello,

I had experienced some memory leaks after some days running hlsclient (a project I am working on). The python process was consuming 16gb of memory.

My team investigated and we found out that sh is the problem. Our python process runs forever, and from time to time we use sh to run FFmpeg.

I reproduced the problem:

import time
import sh

while True:
    sh.ls()
    time.sleep(1)

Output using sh:

$ for i in $(jot 10) ; do ps -o rss= -p $PID; sleep 1; done
  7368
  7396
  7428
  7460
  7488
  7512
  7544
  7580
  7616
  7660

Script without sh:

import time

while True:
    print("")
    time.sleep(1)

Output without sh:

$ for i in $(jot 10) ; do ps -o rss= -p $PID; sleep 1; done
  4024
  4024
  4024
  4024
  4024
  4024
  4024
  4024
  4024
  4024

I replaced sh.ls() by sh.echo() and sh.pwd() and got the same behavior. When I tried using sh.which('ls'), it did not leak.

Did anyone have these leaks? We replaced sh by plain old subprocess and everything works.

@amoffat
Copy link
Owner

amoffat commented Nov 5, 2012

Thanks Hugo, there is another open issue regarding a memory leak #98. There seems to be an issue with the stdout/stdin reader threads being join()ed but never finishing. I need to spend a little more time investigating, but I should be able to push out a fix version tonight.

flavioribeiro added a commit to globocom/hlsclient that referenced this issue Nov 5, 2012
@amoffat
Copy link
Owner

amoffat commented Nov 10, 2012

Fixed on master and pushed to pypi v1.06. The cause was some nasty cyclical references that were preventing garbage collection

@amoffat amoffat closed this as completed Nov 10, 2012
@NotSqrt
Copy link
Contributor

NotSqrt commented Nov 19, 2012

It appears that it is still not completey fixed, even in branch dev, commit 3d6c1b3.

The following code displays a leak

import sh
print sh.__version__
import objgraph  # installable from pypi
import logging


def show_leak():
    objgraph.show_growth(limit=10)  # start counting
    sh.ls()
    objgraph.show_growth(limit=10)  # big increase, normal

    print '\nNow, it matters\n\n'
    sh.ls()
    objgraph.show_growth(limit=10)
# list             +8
# Logger           +4
# dict             +4

    print '\nanother iteration, just to check\n'
    sh.ls()
    objgraph.show_growth(limit=10)
# list             +8
# Logger           +4
# dict             +4

    # objgraph.show_backrefs(objgraph.by_type('Logger'))  # => are referenced by an object of class 'Manager'

    print "\nfind cycles : cycle between an instance of logging.Manager and logging.Logger"
    print objgraph.find_backref_chain(objgraph.by_type('Manager')[0], lambda x: isinstance(x, logging.Logger))


show_leak()

Every time sh.ls is called, 4 logging.Loggers instances are created, and there appears to be a cyle of references between them and an instance of logging.Manager...

It doesn't seem to change anything if the call to sh.ls is in a function, or followed by a gc.collect call.

@amoffat
Copy link
Owner

amoffat commented Nov 20, 2012

Can you confirm if this happens without sh? Try creating an instance of a class that does a self.log = logging.getLogger("test") in its constructor, and see if you have the garbage that doesn't get collected.

@amoffat amoffat reopened this Nov 20, 2012
@NotSqrt
Copy link
Contributor

NotSqrt commented Nov 20, 2012

Very good idea indeed, and you're right !

See this updated code : each time, I override the member self.log with a new logger, with a different name (otherwise nothing new is created)

#!/usr/bin/python2.7

import objgraph  # installable from pypi
import logging


class Testleak(object):
    """docstring for Testleak"""
    def __init__(self):
        self.log = logging.getLogger("test")
        self.incr = 0

    def __call__(self):
        self.log = logging.getLogger("test_%i" % self.incr)
        self.incr += 1


def show_leak():
    a = Testleak()
    objgraph.show_growth(limit=10)  # start counting
    a()
    objgraph.show_growth(limit=10)  # big increase, normal

# list             +2
# Logger           +1
# dict             +1

    print '\nNow, it matters\n\n'
    a()
    objgraph.show_growth(limit=10)
    print '\nanother iteration, just to check\n'
    a()
    objgraph.show_growth(limit=10)


show_leak()

I think the following thread in the Python list is relevant (apart from the multithreading part, it is about creating several loggers): http://grokbase.com/t/python/python-list/116jhbaqsb/improper-creating-of-logger-instances-or-a-memory-leak#20110618wgx7onkmaqvvrqnirprxgpuvky

The important part is that Loggers don't have a __del__ method, and it is possible to manually break the cycle when overriding the logger by de-registering the previous logger from the dict of references that the manager keeps :

del logging.Logger.manager.loggerDict[self.log.name]

I tried this solution on this code, and there is no more leaks.

Hope this help to fix it in sh (maybe by de-registering the previous loggers when a Command is called again ?) !

@amoffat
Copy link
Owner

amoffat commented Nov 20, 2012

Good work on this @NotSqrt. From that Python list you linked, Vinay Sajip (the author of logging IIRC) linked to this: http://docs.python.org/2/howto/logging-cookbook.html#adding-contextual-information-to-your-logging-output

Sometimes you want logging output to contain contextual information in addition to the parameters passed to the logging call. For example, in a networked application, it may be desirable to log client-specific information in the log (e.g. remote client’s username, or IP address). Although you could use the extra parameter to achieve this, it’s not always convenient to pass the information in this way. While it might be tempting to create Logger instances on a per-connection basis, this is not a good idea because these instances are not garbage collected. While this is not a problem in practice, when the number of Logger instances is dependent on the level of granularity you want to use in logging an application, it could be hard to manage if the number of Logger instances becomes effectively unbounded.

So it looks like I'm guilty of that :) I'll probably go about using a logging adaptor described on that page, instead of trying to work out a garbage cleanup mechanism

@amoffat
Copy link
Owner

amoffat commented Nov 21, 2012

Fixes on master and v1.07

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants