Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

After migrating to 1.3.3 there is 100% CPU usage and memory leak after some time #1667

Closed
misiek08 opened this issue Feb 21, 2017 · 15 comments
Closed
Labels

Comments

@misiek08
Copy link

Long story short

After upgrading 1.2.0 to 1.3.3 there are CPU overusages and memory leaks on Linux (Ubuntu 16.04).
Works good on OS X 10.12.3 (Sierra) or the bug shows after too long time. I tried same script on OS X and Ubuntu and Ubuntu starts leaking after 15-30 seconds.

Expected behaviour

No 100% CPU usage and memory leaks during whole time.

Actual behaviour

100% CPU usage and memory leaks (even 1MB/s on many clients).

Steps to reproduce

"""Try to find leak and bugs in aiohttp which occurs on Linux."""
import asyncio
import aiohttp
import json
import urllib


def _get(loop, url, params=None):
    _connector = aiohttp.TCPConnector(loop=loop, verify_ssl=False)

    uri = 'http://consul' + url
    if params:
        uri = '%s?%s' % (uri, urllib.parse.urlencode(params))

    resp = yield from aiohttp.request(
        'GET', uri, connector=_connector, loop=loop)

    body = yield from resp.text(encoding='utf-8')

    data = json.loads(body)
    return resp.headers['X-Consul-Index'], data


@asyncio.coroutine
def _sync_service(loop, service):
    index = 0
    while True:
        try:
            index, data = yield from _get(
                loop,
                '/v1/catalog/service/{}'.format(service),
                params={'index': index}
            )
        except Exception:
            pass
        yield from asyncio.sleep(0.1)


@asyncio.coroutine
def _leak(loop, services):
    _, services = yield from _get(loop, '/v1/catalog/services')

    for i in range(50):
        for service, _ in services.items():
            loop.create_task(_sync_service(loop, service))


if __name__ == '__main__':
    services = {}
    loop = asyncio.get_event_loop()
    loop.create_task(_leak(loop, services))
    loop.run_forever()

Your environment

Ubuntu 16.04

@fafhrd91
Copy link
Member

could you try 1.3 branch, I made some changes

@misiek08
Copy link
Author

I reproduced it with OS X too, so it's probably not OS dependent.

We tried ccdab64 on both OS X and Linux and it still leaks. Script work good for a while and then CPU usage goes to 100% and memory leaks very quickly.

In my code there's range(50) - it's just to speed up leaking, so if you don't have enough memory - change this value to 2 and it will behave the same.

@fafhrd91
Copy link
Member

@misiek08 I can not reproduce the leak. could you modify _get, add resp.close() before return

@fafhrd91
Copy link
Member

I can not really test your script because you use some internal service http://consul

@kxepal
Copy link
Member

kxepal commented Feb 23, 2017

@misiek08
Try to remove your except Exception in sync function and you'll see the reason why.

A side note: python-consul has asyncio support based on aiohttp.

@kxepal
Copy link
Member

kxepal commented Feb 23, 2017

I replaced your _get to consul client usage, no cpu burner, no fd leek, everything works fine:

import asyncio
import aiohttp
import consul.aio

@asyncio.coroutine
def _sync_service(loop, client, service):
    index = None
    while True:
        try:
            index, data = yield from client.catalog.service(service, index=index)
        except Exception:
            raise
        yield from asyncio.sleep(0.1)


@asyncio.coroutine
def _leak(loop, services):
    client = consul.aio.Consul('consul', 8500)
    _, services = yield from client.catalog.services()
    for i in range(50):
        for service, _ in services.items():
            loop.create_task(_sync_service(loop, client, service))


if __name__ == '__main__':
    services = {}
    loop = asyncio.get_event_loop()
    loop.create_task(_leak(loop, services))
    loop.run_forever()


@kxepal
Copy link
Member

kxepal commented Feb 23, 2017

To fix your issue you should either reuse TCPConnector for all the requests or properly close it when it needs no more or may be disable keep alive feature.

@misiek08
Copy link
Author

misiek08 commented Feb 24, 2017

@kxepal I use python-consul and it's aio part and there I see leaks. I can't reuse TCPConnector, because I wonna keep long-polling collections for 50-60 services.

This internal service is just Consul, so you can set a standard Consul cluster and make there some changes.

I don't see any problem with except, client should be collected, because I don't keep any reference. I use exactly the same code with 1.2.0 and 1.3.3, so this doesn't look as problem on my side, because we use similar code in production for few days and with 1.2.0 there is no problem. After upgrading to 1.3.3 the problem appears after few seconds. Of course - the more tasks and connection you open, the bigger and faster leak gets.

@kxepal
Copy link
Member

kxepal commented Feb 24, 2017

I use python-consul and it's aio part and there I see leaks.

I wasn't able to reproduce that, but I had a success with your example in initial post here.

I can't reuse TCPConnector, because I wonna keep long-polling collections for 50-60 services.

Well, if you use the consul client it does share the same TCPConnector for all the requests, so you won't hit that problem. And this is what I observe. I see no problem here.

Also, what kind of problem are you trying to solve by such kind of action? Just curious.

I don't see any problem with except

It hides the real issue from you. I was ended with:

Traceback (most recent call last):
  File "/usr/lib64/python3.5/asyncio/tasks.py", line 239, in _step
    result = coro.send(None)
  File "test.py", line 32, in _sync_service
    params={'index': index}
  File "test.py", line 16, in _get
    'GET', uri, connector=_connector, loop=loop)
  File "/home/kxepal/temp/venv/lib/python3.5/site-packages/aiohttp/client.py", line 629, in __iter__
  File "/home/kxepal/temp/venv/lib/python3.5/site-packages/aiohttp/client.py", line 215, in _request
  File "/home/kxepal/temp/venv/lib/python3.5/site-packages/aiohttp/connector.py", line 360, in connect
    .format(key, exc.strerror)) from exc
aiohttp.errors.ClientOSError: [Errno 24] Cannot connect to host localhost:8500 ssl:False [Can not connect to localhost:8500 [Too many open files]]
Task exception was never retrieved
future: <Task finished coro=<_sync_service() done, defined at test.py:24> exception=ClientOSError(24, 'Cannot connect to host localhost:8500 ssl:False [Can not connect to localhost:8500 [Too many open files]]')>
Traceback (most recent call last):
  File "/home/kxepal/temp/venv/lib/python3.5/site-packages/aiohttp/connector.py", line 666, in _create_direct_connection
    local_addr=self._local_addr)
  File "/usr/lib64/python3.5/asyncio/base_events.py", line 695, in create_connection
    raise exceptions[0]
  File "/usr/lib64/python3.5/asyncio/base_events.py", line 662, in create_connection
    sock = socket.socket(family=family, type=type, proto=proto)
  File "/usr/lib64/python3.5/socket.py", line 134, in __init__
OSError: [Errno 24] Too many open files

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/kxepal/temp/venv/lib/python3.5/site-packages/aiohttp/connector.py", line 350, in connect
    yield from self._create_connection(req)
  File "/home/kxepal/temp/venv/lib/python3.5/site-packages/aiohttp/connector.py", line 643, in _create_connection
    transport, proto = yield from self._create_direct_connection(req)
  File "/home/kxepal/temp/venv/lib/python3.5/site-packages/aiohttp/connector.py", line 689, in _create_direct_connection
    (req.host, req.port, exc.strerror)) from exc
aiohttp.errors.ClientOSError: [Errno 24] Can not connect to localhost:8500 [Too many open files]

client should be collected, because I don't keep any reference

You didn't, but internally, TCPConnector keeps sockets open for keepalive timeframe unless it was explicitly closed and it won't being collected untill all the connections will get closed. You usage of it is just wrong here.

Try to close the connector in your _get coroutine and tell if you still observe leak or not.

@misiek08
Copy link
Author

misiek08 commented Mar 1, 2017

Also, what kind of problem are you trying to solve by such kind of action? Just curious.

To watch changes in consul for different services.

You didn't, but internally, TCPConnector keeps sockets open for keepalive timeframe unless it was explicitly closed and it won't being collected untill all the connections will get closed. You usage of it is just wrong here.

I understand that some objects are not getting deleted, but I'm just curious why same code work without any problems on aiohttp==1.2.0 and leaks very badly at 1.3.3 (I tried only 1.2.0 and 1.3.3, but no other version in between).

@fafhrd91
Copy link
Member

could you try this https://github.com/aio-libs/aiohttp/
it has much better connection pooling

@mkurek
Copy link

mkurek commented Mar 13, 2017

@fafhrd91 which one aiohttp is official now then?

@fafhrd91
Copy link
Member

fafhrd91 commented Mar 13, 2017 via email

@fafhrd91
Copy link
Member

2.0 should fix this problem

@lock
Copy link

lock bot commented Oct 28, 2019

This thread has been automatically locked since there has not been
any recent activity after it was closed. Please open a new issue for
related bugs.

If you feel like there's important points made in this discussion,
please include those exceprts into that new issue.

@lock lock bot added the outdated label Oct 28, 2019
@lock lock bot locked as resolved and limited conversation to collaborators Oct 28, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

4 participants