After migrating to 1.3.3 there is 100% CPU usage and memory leak after some time #1667

misiek08 · 2017-02-21T15:57:26Z

Long story short

After upgrading 1.2.0 to 1.3.3 there are CPU overusages and memory leaks on Linux (Ubuntu 16.04).
Works good on OS X 10.12.3 (Sierra) or the bug shows after too long time. I tried same script on OS X and Ubuntu and Ubuntu starts leaking after 15-30 seconds.

Expected behaviour

No 100% CPU usage and memory leaks during whole time.

Actual behaviour

100% CPU usage and memory leaks (even 1MB/s on many clients).

Steps to reproduce

"""Try to find leak and bugs in aiohttp which occurs on Linux."""
import asyncio
import aiohttp
import json
import urllib


def _get(loop, url, params=None):
    _connector = aiohttp.TCPConnector(loop=loop, verify_ssl=False)

    uri = 'http://consul' + url
    if params:
        uri = '%s?%s' % (uri, urllib.parse.urlencode(params))

    resp = yield from aiohttp.request(
        'GET', uri, connector=_connector, loop=loop)

    body = yield from resp.text(encoding='utf-8')

    data = json.loads(body)
    return resp.headers['X-Consul-Index'], data


@asyncio.coroutine
def _sync_service(loop, service):
    index = 0
    while True:
        try:
            index, data = yield from _get(
                loop,
                '/v1/catalog/service/{}'.format(service),
                params={'index': index}
            )
        except Exception:
            pass
        yield from asyncio.sleep(0.1)


@asyncio.coroutine
def _leak(loop, services):
    _, services = yield from _get(loop, '/v1/catalog/services')

    for i in range(50):
        for service, _ in services.items():
            loop.create_task(_sync_service(loop, service))


if __name__ == '__main__':
    services = {}
    loop = asyncio.get_event_loop()
    loop.create_task(_leak(loop, services))
    loop.run_forever()

Your environment

Ubuntu 16.04

fafhrd91 · 2017-02-22T04:50:01Z

could you try 1.3 branch, I made some changes

misiek08 · 2017-02-22T08:47:31Z

I reproduced it with OS X too, so it's probably not OS dependent.

We tried ccdab64 on both OS X and Linux and it still leaks. Script work good for a while and then CPU usage goes to 100% and memory leaks very quickly.

In my code there's range(50) - it's just to speed up leaking, so if you don't have enough memory - change this value to 2 and it will behave the same.

fafhrd91 · 2017-02-22T17:26:11Z

@misiek08 I can not reproduce the leak. could you modify _get, add resp.close() before return

fafhrd91 · 2017-02-22T18:12:24Z

I can not really test your script because you use some internal service http://consul

kxepal · 2017-02-23T08:03:10Z

@misiek08
Try to remove your except Exception in sync function and you'll see the reason why.

A side note: python-consul has asyncio support based on aiohttp.

kxepal · 2017-02-23T08:09:16Z

I replaced your _get to consul client usage, no cpu burner, no fd leek, everything works fine:

import asyncio
import aiohttp
import consul.aio

@asyncio.coroutine
def _sync_service(loop, client, service):
    index = None
    while True:
        try:
            index, data = yield from client.catalog.service(service, index=index)
        except Exception:
            raise
        yield from asyncio.sleep(0.1)


@asyncio.coroutine
def _leak(loop, services):
    client = consul.aio.Consul('consul', 8500)
    _, services = yield from client.catalog.services()
    for i in range(50):
        for service, _ in services.items():
            loop.create_task(_sync_service(loop, client, service))


if __name__ == '__main__':
    services = {}
    loop = asyncio.get_event_loop()
    loop.create_task(_leak(loop, services))
    loop.run_forever()

kxepal · 2017-02-23T08:14:27Z

To fix your issue you should either reuse TCPConnector for all the requests or properly close it when it needs no more or may be disable keep alive feature.

misiek08 · 2017-02-24T15:13:16Z

@kxepal I use python-consul and it's aio part and there I see leaks. I can't reuse TCPConnector, because I wonna keep long-polling collections for 50-60 services.

This internal service is just Consul, so you can set a standard Consul cluster and make there some changes.

I don't see any problem with except, client should be collected, because I don't keep any reference. I use exactly the same code with 1.2.0 and 1.3.3, so this doesn't look as problem on my side, because we use similar code in production for few days and with 1.2.0 there is no problem. After upgrading to 1.3.3 the problem appears after few seconds. Of course - the more tasks and connection you open, the bigger and faster leak gets.

kxepal · 2017-02-24T15:42:44Z

I use python-consul and it's aio part and there I see leaks.

I wasn't able to reproduce that, but I had a success with your example in initial post here.

I can't reuse TCPConnector, because I wonna keep long-polling collections for 50-60 services.

Well, if you use the consul client it does share the same TCPConnector for all the requests, so you won't hit that problem. And this is what I observe. I see no problem here.

Also, what kind of problem are you trying to solve by such kind of action? Just curious.

I don't see any problem with except

It hides the real issue from you. I was ended with:

Traceback (most recent call last):
  File "/usr/lib64/python3.5/asyncio/tasks.py", line 239, in _step
    result = coro.send(None)
  File "test.py", line 32, in _sync_service
    params={'index': index}
  File "test.py", line 16, in _get
    'GET', uri, connector=_connector, loop=loop)
  File "/home/kxepal/temp/venv/lib/python3.5/site-packages/aiohttp/client.py", line 629, in __iter__
  File "/home/kxepal/temp/venv/lib/python3.5/site-packages/aiohttp/client.py", line 215, in _request
  File "/home/kxepal/temp/venv/lib/python3.5/site-packages/aiohttp/connector.py", line 360, in connect
    .format(key, exc.strerror)) from exc
aiohttp.errors.ClientOSError: [Errno 24] Cannot connect to host localhost:8500 ssl:False [Can not connect to localhost:8500 [Too many open files]]
Task exception was never retrieved
future: <Task finished coro=<_sync_service() done, defined at test.py:24> exception=ClientOSError(24, 'Cannot connect to host localhost:8500 ssl:False [Can not connect to localhost:8500 [Too many open files]]')>
Traceback (most recent call last):
  File "/home/kxepal/temp/venv/lib/python3.5/site-packages/aiohttp/connector.py", line 666, in _create_direct_connection
    local_addr=self._local_addr)
  File "/usr/lib64/python3.5/asyncio/base_events.py", line 695, in create_connection
    raise exceptions[0]
  File "/usr/lib64/python3.5/asyncio/base_events.py", line 662, in create_connection
    sock = socket.socket(family=family, type=type, proto=proto)
  File "/usr/lib64/python3.5/socket.py", line 134, in __init__
OSError: [Errno 24] Too many open files

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/kxepal/temp/venv/lib/python3.5/site-packages/aiohttp/connector.py", line 350, in connect
    yield from self._create_connection(req)
  File "/home/kxepal/temp/venv/lib/python3.5/site-packages/aiohttp/connector.py", line 643, in _create_connection
    transport, proto = yield from self._create_direct_connection(req)
  File "/home/kxepal/temp/venv/lib/python3.5/site-packages/aiohttp/connector.py", line 689, in _create_direct_connection
    (req.host, req.port, exc.strerror)) from exc
aiohttp.errors.ClientOSError: [Errno 24] Can not connect to localhost:8500 [Too many open files]

client should be collected, because I don't keep any reference

You didn't, but internally, TCPConnector keeps sockets open for keepalive timeframe unless it was explicitly closed and it won't being collected untill all the connections will get closed. You usage of it is just wrong here.

Try to close the connector in your _get coroutine and tell if you still observe leak or not.

misiek08 · 2017-03-01T11:12:10Z

Also, what kind of problem are you trying to solve by such kind of action? Just curious.

To watch changes in consul for different services.

You didn't, but internally, TCPConnector keeps sockets open for keepalive timeframe unless it was explicitly closed and it won't being collected untill all the connections will get closed. You usage of it is just wrong here.

I understand that some objects are not getting deleted, but I'm just curious why same code work without any problems on aiohttp==1.2.0 and leaks very badly at 1.3.3 (I tried only 1.2.0 and 1.3.3, but no other version in between).

fafhrd91 · 2017-03-12T05:32:30Z

could you try this https://github.com/aio-libs/aiohttp/
it has much better connection pooling

mkurek · 2017-03-13T07:21:12Z

@fafhrd91 which one aiohttp is official now then?

fafhrd91 · 2017-03-13T07:45:09Z

We are migrating development to aio-libs organization

…

Sent from my iPhone

On Mar 13, 2017, at 12:21 AM, Mateusz Kurek ***@***.***> wrote: @fafhrd91 which one aiohttp is official now then? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

fafhrd91 · 2017-03-14T17:49:14Z

2.0 should fix this problem

lock · 2019-10-28T23:02:41Z

This thread has been automatically locked since there has not been
any recent activity after it was closed. Please open a new issue for
related bugs.

If you feel like there's important points made in this discussion,
please include those exceprts into that new issue.

fafhrd91 closed this as completed Mar 14, 2017

nylonee mentioned this issue Apr 24, 2017

Cpu usage 100% nickoala/telepot#231

Closed

samuelcolvin mentioned this issue Jul 14, 2017

ClientConnectorError: "Too many open files" with ClientSession #2094

Closed

lock bot added the outdated label Oct 28, 2019

lock bot locked as resolved and limited conversation to collaborators Oct 28, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

After migrating to 1.3.3 there is 100% CPU usage and memory leak after some time #1667

After migrating to 1.3.3 there is 100% CPU usage and memory leak after some time #1667

misiek08 commented Feb 21, 2017

fafhrd91 commented Feb 22, 2017

misiek08 commented Feb 22, 2017

fafhrd91 commented Feb 22, 2017

fafhrd91 commented Feb 22, 2017

kxepal commented Feb 23, 2017

kxepal commented Feb 23, 2017

kxepal commented Feb 23, 2017 •

edited

Loading

misiek08 commented Feb 24, 2017 •

edited

Loading

kxepal commented Feb 24, 2017

misiek08 commented Mar 1, 2017 •

edited

Loading

fafhrd91 commented Mar 12, 2017

mkurek commented Mar 13, 2017

fafhrd91 commented Mar 13, 2017 via email

fafhrd91 commented Mar 14, 2017

lock bot commented Oct 28, 2019

After migrating to 1.3.3 there is 100% CPU usage and memory leak after some time #1667

After migrating to 1.3.3 there is 100% CPU usage and memory leak after some time #1667

Comments

misiek08 commented Feb 21, 2017

Long story short

Expected behaviour

Actual behaviour

Steps to reproduce

Your environment

fafhrd91 commented Feb 22, 2017

misiek08 commented Feb 22, 2017

fafhrd91 commented Feb 22, 2017

fafhrd91 commented Feb 22, 2017

kxepal commented Feb 23, 2017

kxepal commented Feb 23, 2017

kxepal commented Feb 23, 2017 • edited Loading

misiek08 commented Feb 24, 2017 • edited Loading

kxepal commented Feb 24, 2017

misiek08 commented Mar 1, 2017 • edited Loading

fafhrd91 commented Mar 12, 2017

mkurek commented Mar 13, 2017

fafhrd91 commented Mar 13, 2017 via email

fafhrd91 commented Mar 14, 2017

lock bot commented Oct 28, 2019

kxepal commented Feb 23, 2017 •

edited

Loading

misiek08 commented Feb 24, 2017 •

edited

Loading

misiek08 commented Mar 1, 2017 •

edited

Loading