Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increasing memory usage in update_current_time #1500

Closed
FUSAKLA opened this issue Feb 28, 2019 · 20 comments
Closed

Increasing memory usage in update_current_time #1500

FUSAKLA opened this issue Feb 28, 2019 · 20 comments

Comments

@FUSAKLA
Copy link

FUSAKLA commented Feb 28, 2019

Describe the bug
I see constantly increasing memory usage of my running Sanic instance.
After using tracemalloc to identify cause of this I found top 5 memory allocations

/usr/local/lib/python3.5/dist-packages/sanic/server.py:547: size=16.5 MiB, count=113860, average=152 B
<frozen importlib._bootstrap_external>:484: size=1075 KiB, count=17138, average=64 B
/usr/lib/python3.5/copy.py:240: size=882 KiB, count=14116, average=64 B
/usr/local/lib/python3.5/dist-packages/sanic/server.py:328: size=540 KiB, count=4940, average=112 B
/usr/local/lib/python3.5/dist-packages/sanic/server.py:157: size=434 KiB, count=3967, average=112 B

If I look at the /usr/local/lib/python3.5/dist-packages/sanic/server.py:547 which has size=16.5 MiB it leads to
https://github.com/huge-success/sanic/blob/52deebaf65ab48d5fbfa92607f22d96ee1bdb7a7/sanic/server.py#L607

Memory consumption graph (gaps are OOM kills due to memory limits in Kubernetes)
image

Could there be an issue with this part of coude in terms of leaking memory?

Environment (please complete the following information):

  • Debian stretch running in Kubernetes (Docker)
  • Version sanic-18.12.0-py3

Additional context
Sorry I'm not a Python expert, maybe I'm using the tracemalloc the wrong way or the issue s somewhere else. In that case I'm sorry and would be glad for any Ideas where to look or how to profile it.

Thanks in advance

@kbcasagrande
Copy link

kbcasagrande commented Feb 28, 2019

I was about to open an issue about it. I've also just identified where the possible leak comes from;
I checked the memory usage using mem_top;

After few requests, the output is:

refs:
1210    <class 'dict'> {140519667199136: <weakref at 0x7fcd3b70d638; to 'type' at 0x7fcd48d8eca0 (type)>, 140519667196960: 
1122    <class 'list'> ['import logging\n', 'import logging.config\n', 'import os\n', 'import re\n', 'import warnings\n', '
910     <class 'dict'> {'application/octet-stream': ['.a', '.bin', '.dll', '.exe', '.o', '.obj', '.so', '.lha', '.lzh', '.c
444     <class 'dict'> {'builtins': <module 'builtins' (built-in)>, 'sys': <module 'sys' (built-in)>, '_frozen_importlib': 
366     <class 'dict'> {'__name__': 'socket', '__doc__': "This module provides socket operations and some related functions
339     <class 'dict'> {'__name__': 'os', '__doc__': "OS routines for NT or Posix depending on what system we're on.\n\nThi
337     <class 'dict'> {'__name__': '_socket', '__doc__': 'Implementation module for socket operations.\n\nSee the socket m
337     <class 'dict'> {'__name__': '_socket', '__doc__': 'Implementation module for socket operations.\n\nSee the socket m
336     <class 'list'> ['fromfd', 'getfqdn', 'create_connection', 'AddressFamily', 'SocketKind', 'AF_ALG', 'AF_APPLETALK', 
310     <class 'list'> ['altsep', 'curdir', 'pardir', 'sep', 'pathsep', 'linesep', 'defpath', 'name', 'path', 'devnull', 'S

bytes:
256544   <uvloop.Loop running=True closed=False debug=False>
36968    {'application/octet-stream': ['.a', '.bin', '.dll', '.exe', '.o', '.obj', '.so', '.lha', '.lzh', '.c
18528    {140519667199136: <weakref at 0x7fcd3b70d638; to 'type' at 0x7fcd48d8eca0 (type)>, 140519667196960: 
18528    {'builtins': <module 'builtins' (built-in)>, 'sys': <module 'sys' (built-in)>, '_frozen_importlib': 
18528    {'__name__': 'os', '__doc__': "OS routines for NT or Posix depending on what system we're on.\n\nThi
18528    {'__name__': 'socket', '__doc__': "This module provides socket operations and some related functions
9656     ['import logging\n', 'import logging.config\n', 'import os\n', 'import re\n', 'import warnings\n', '
9320     {'__name__': 'posix', '__doc__': 'This module provides access to operating system functionality that
9320     {'__name__': 'posix', '__doc__': 'This module provides access to operating system functionality that
9320     {'__name__': '_socket', '__doc__': 'Implementation module for socket operations.\n\nSee the socket m

types:
8104     <class 'function'>
4781     <class 'tuple'>
4560     <class 'dict'>
2488     <class 'weakref'>
2018     <class 'list'>
1416     <class 'method_descriptor'>
1361     <class 'wrapper_descriptor'>
1241     <class 'getset_descriptor'>
1090     <class 'type'>
1045     <class 'set'>

After many requests for few minutes it turns to:

refs:
1210    <class 'dict'> {139867876654240: <weakref at 0x7f3579b26638; to 'type' at 0x7f35871a7ca0 (type)>, 139867876652064: 
1122    <class 'list'> ['import logging\n', 'import logging.config\n', 'import os\n', 'import re\n', 'import warnings\n', '
1079    <class 'set'> {<TimerHandle functools.partial(<function update_current_time at 0x7f3578b510d0>, <uvloop.Loop runni
910     <class 'dict'> {'application/octet-stream': ['.a', '.bin', '.dll', '.exe', '.o', '.obj', '.so', '.lha', '.lzh', '.c
444     <class 'dict'> {'builtins': <module 'builtins' (built-in)>, 'sys': <module 'sys' (built-in)>, '_frozen_importlib': 
366     <class 'dict'> {'__name__': 'socket', '__doc__': "This module provides socket operations and some related functions
339     <class 'dict'> {'__name__': 'os', '__doc__': "OS routines for NT or Posix depending on what system we're on.\n\nThi
337     <class 'dict'> {'__name__': '_socket', '__doc__': 'Implementation module for socket operations.\n\nSee the socket m
337     <class 'dict'> {'__name__': '_socket', '__doc__': 'Implementation module for socket operations.\n\nSee the socket m
336     <class 'list'> ['fromfd', 'getfqdn', 'create_connection', 'AddressFamily', 'SocketKind', 'AF_ALG', 'AF_APPLETALK', 

bytes:
256544   <uvloop.Loop running=True closed=False debug=False>
36968    {'application/octet-stream': ['.a', '.bin', '.dll', '.exe', '.o', '.obj', '.so', '.lha', '.lzh', '.c
32992    {<TimerHandle functools.partial(<function update_current_time at 0x7f3578b510d0>, <uvloop.Loop runni
18528    {139867876654240: <weakref at 0x7f3579b26638; to 'type' at 0x7f35871a7ca0 (type)>, 139867876652064: 
18528    {'builtins': <module 'builtins' (built-in)>, 'sys': <module 'sys' (built-in)>, '_frozen_importlib': 
18528    {'__name__': 'os', '__doc__': "OS routines for NT or Posix depending on what system we're on.\n\nThi
18528    {'__name__': 'socket', '__doc__': "This module provides socket operations and some related functions
9656     ['import logging\n', 'import logging.config\n', 'import os\n', 'import re\n', 'import warnings\n', '
9320     {'__name__': 'posix', '__doc__': 'This module provides access to operating system functionality that
9320     {'__name__': 'posix', '__doc__': 'This module provides access to operating system functionality that

types:
8104     <class 'function'>
5849     <class 'tuple'>
4556     <class 'dict'>
2486     <class 'weakref'>
2018     <class 'list'>
1416     <class 'method_descriptor'>
1361     <class 'wrapper_descriptor'>
1241     <class 'getset_descriptor'>
1090     <class 'type'>
1084     <class 'functools.partial'>

The following image show the memory usage evolution of the same application running in the Kubernetes cluster:

image

During the time where the memory usage grew, the application basically received health check requests, which is a simple get that returns a simple json saying that the application is alive;

Environment (please complete the following information):

  • Alpine 3.8 running in Kubernetes (Docker)
  • Version sanic-18.12.0
  • Python version 3.6.6

@vltr
Copy link
Member

vltr commented Feb 28, 2019

I would take a wild guess that Python's GC is not collecting the hanging futures inside each function call to itself - but I might be wrong.

IMO, we don't need to update the clock every second; we just need to call time() where current_time is required.

@FUSAKLA
Copy link
Author

FUSAKLA commented Feb 28, 2019

Might be. According to the comment it was supposed to mitigate the possible overhead but I'm not sure how much it helps.

Cache the current time, since it is needed at the end of every
keep-alive request to update the request timeout time

@vltr
Copy link
Member

vltr commented Feb 28, 2019

Yep, but I think this is unnecessary since calling time() has an almost neglectable overhead:

import timeit

setup_stmt = """
from time import time


def call_time():
    assert time() is not None

"""
timed_stmt = {
    "call_time": 'call_time()',
}
times = 1_000_000

for name, stmt in timed_stmt.items():
    time_to_run = timeit.timeit(stmt, setup=setup_stmt, number=times)
    time_each = time_to_run / times
    print("%s: %d results in %f s (%f us per result, %f per sec)" %
          (name, times, time_to_run, time_each * times, 1. / time_each))
call_time: 1000000 results in 0.110726 s (0.110726 us per result, 9031334.549457 per sec)

@langma
Copy link
Contributor

langma commented Mar 1, 2019

I can confirm that removing the update_current_time logic keeps the pod's memory in check. I think the leak is happening only when there are requests? I'm not totally sure, but I agree with @vltr that the benefit of caching time() is probably really small, so I'm not gonna dig deeper for now.

Removing current_time variable also removes the only global variable in the codebase :)

I can make a PR if this sounds like the proper way to go forward?

@FUSAKLA
Copy link
Author

FUSAKLA commented Mar 4, 2019

Great, thanks for resolving it so quickly!
IIUC the fix will be in 18.13 and possibly 18.12.1, is there any estimation on the release date?

@ahopkins
Copy link
Member

ahopkins commented Mar 5, 2019

@FUSAKLA Look for two releases for 18.12.1 (a few fixes for the LTS) and 19.3 (the next releases in the schedule) sometime in the next couple weeks.

@FUSAKLA
Copy link
Author

FUSAKLA commented Mar 25, 2019

FYI memory usage after 19.3.1 upgrade, thanks works great!

image

@ahopkins
Copy link
Member

Thanks for sharing

@RifeWang
Copy link

I had the same issue with memory leak for 18.12.1 , and then I have updated to 19.3 ,but the memory is still increasing in a very slow speed.

memory usage in two days :
D7A92982-4421-42D7-BC36-49EFC44D1ED0

and I use multiple workers, the machine is 4 cores and 6GB memory:

workers = multiprocessing.cpu_count()
if __name__ == '__main__':
    app.run(host='0.0.0.0', port=8000, workers=workers)

the result when I exec ps -aux | grep python

F9CE1A65-8FFD-426F-AE37-0DD6C5E99BE3

the another question is why so many processes , I think it should be 4 processes ? I use pm2 fork mode , its relation with pm2 ?

@ahopkins
Copy link
Member

@RifeWang I would look to pm2 first. I am not 100% familiar with it and how it handles python applications. Does it act as a proxy server? How and when does it decide to start a process? To continue this conversation, please start a new issue and link to this here.

@ladler0320
Copy link

I also experience the memory leak issue when using more than one worker on Sanic 19.6.0 with python 3.7.3

@sjsadowski
Copy link
Contributor

@ladler0320 19.6.0 is not out yet, do you mean 19.3?

@ladler0320
Copy link

@sjsadowski I'm not quite sure, but it's probably the 19.6.0.
I installed it using
pip install git+https://github.com/huge-success/sanic.git@a57c14c70bca3514d502db3ce53fd3262d1af149
because of #1549 fix and it was shown that the Sanic version is 19.6.0

@sjsadowski
Copy link
Contributor

Got it, you installed from git master. Can you share your code and process to determine memory leaks?

@ladler0320
Copy link

@sjsadowski I do apologize, but I'm not sure I'm allowed to share the code. However, I can confirm that memory leak occurs when I'm using more than one worker and only when the service receives requests.
Sure, I'll try to investigate that issue and I'll let you know if there will be any update.

@sjsadowski
Copy link
Contributor

@ladler0320 if you can do pseudocode, that would help too. It's hard to know what to fix if we can't identify what and where it's broken.

@ladler0320
Copy link

@sjsadowski Sure.

def init_app():
   app_local = Sanic(name="App", log_config=LOGGING_CONFIG)
   app_local.config.update(APP_CONFIG) # "RESPONSE_TIMEOUT": 0.5

   app_local.error_handler.add(exceptions.ServiceUnavailable, timeout_error_handler)
   app_local.add_route(handler, "/models/<model>", methods=["POST"])

   return app_local


async def handler(request, model):
   loop = asyncio.get_event_loop()
   run = loop.run_in_executor

   try:
       with async_timeout.timeout(0.5, loop=loop):
           data = request.json
           resp = await run(pool, function, model, data)
           status = 200
           resp = js.dumps(resp, ensure_ascii=False)

   except (concurrent.futures.CancelledError,
           concurrent.futures.TimeoutError,
           asyncio.TimeoutError):
       resp = ""
       status = 504

   except Exception as e:
       logger.info(e)
       resp = ""
       status = 500

   return response.text(resp, status=status)


def timeout_error_handler(request, exception):
   return response.text("", 504)


if __name__ == '__main__':
   parser = argparse.ArgumentParser()
   parser.add_argument("--host", default=os.getenv("HOST", "0.0.0.0"))
   parser.add_argument("--port", type=int, default=os.getenv("PORT", 1234))

   args = parser.parse_args()

   host = args.host
   port: int = args.port
   workers = int(os.getenv("WORKERS")) 
   threads = int(os.getenv("THREADS"))

   app = init_app()
   pool = concurrent.futures.ThreadPoolExecutor(max_workers=threads)
   app.run(host=host, port=port, workers=workers, debug=False)

@vankata-ashokkumar
Copy link

vankata-ashokkumar commented Jun 9, 2021

Facing a similar issue with 20.6.3, Any update on this?

@sjsadowski

@ahopkins
Copy link
Member

ahopkins commented Jun 9, 2021

Update on what? This is closed 2 years. If you are experiencing something, please open a new issue and include some relevant details and source.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants