Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple AsyncHTTPClient objects created for one asyncio event loop #3229

Open
hclivess opened this issue Feb 7, 2023 · 7 comments
Open

Comments

@hclivess
Copy link

hclivess commented Feb 7, 2023

It appears that in some cases that the mechanism that reuses AsyncHTTPClient objects is failing, allowing multiple such objects to be created and, for example, exceed max_clients limitations. The problem could be in the asyncio-to-IOLoop instance cache or the IOLoop-to-AsyncHTTPClient mapping.

Original message follows:

Greetings. I recently ran into a problem in my code where I have too many AsyncHTTPClient spawning. I attempted to resolve this problem by using a "with" statement so the object is automatically destroyed at the end. Unfortunately, such approach is not currently supported. Can you please add start and end so it can be used that way?

Minimal reproducible example:

import asyncio
asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())

from tornado.httpclient import AsyncHTTPClient

async def fetch():
    with AsyncHTTPClient() as http_client:
        url = f"http://googe.com"
        response = await http_client.fetch(url, request_timeout=5)
        return response

asyncio.run(fetch())
Traceback (most recent call last):
  File "C:\Users\x\PycharmProjects\nado\ptester.py", line 12, in <module>
    asyncio.run(fetch())
  File "C:\Program Files\Python310\lib\asyncio\runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "C:\Program Files\Python310\lib\asyncio\base_events.py", line 649, in run_until_complete
    return future.result()
  File "C:\Users\x\PycharmProjects\nado\ptester.py", line 7, in fetch
    with AsyncHTTPClient() as http_client:
AttributeError: __enter__

My temporary solution:

try:
    http_client=AsyncHTTPClient()
except Exception as e:
    raise e
finally:
    del http_client
@bdarnell
Copy link
Member

bdarnell commented Feb 7, 2023

The AsyncHTTPClient class is unusual: it secretly reuses a single client object instead of creating a new one each time. This means that in general you're not supposed to close it. (Yes, this is very weird and I'd do it differently if I were starting over, but it's difficult to change at this point. See #2049)

Most of the time, using AsyncHTTPClient just looks like

http = AsyncHTTPClient()  # Can save in attribute like self.http_client if you prefer
resp = await http.fetch(url)

The most common case where you need to actually close the AsyncHTTPClient is in test cases, where you might want to run AsyncHTTPClient().close() in between tests (when you can be sure that nothing else is using the same object).

Or, if you want more isolation, you can use the force_instance attribute when constructing the AsyncHTTPClient. In this case you do need to close it:

with contextlib.closing(AsyncHTTPClient(force_instance=True)) as http:
    r = await http.fetch(...)

This would be a little cleaner if we supported the with statement, but that would make it too easy to do the wrong thing and close the client when you're not supposed to. I'd like to evolve the API in this direction although we need to figure out the transition plan.

@hclivess
Copy link
Author

hclivess commented Feb 7, 2023

Thank you for the explanation and recommendations. Here are the details of what happened and why I wanted this feature, if you're interested:

I have tried the first approach, where I simply define AsyncHTTPClient before each call. This resulted in my application being overloaded and choking its network access, especially (but not only) on Windows. I am making a lot of calls to a lot of interfaces simultaneously. I suspect there were some leftovers somewhere in memory, although I have no idea how or why.

When I attempted to reuse the object passing it to my semaphored gather async fucntions from a central shared object defined only once, some calls would not execute and my application would stall, although this might be due to some unknowns in my code.

@bdarnell
Copy link
Member

bdarnell commented Feb 7, 2023

I have tried the first approach, where I simply define AsyncHTTPClient before each call. This resulted in my application being overloaded and choking its network access

That's not supposed to happen - that's the whole reason that we have all the magic to reuse a single instance. It should be limiting itself to max_clients connections at a time, which defaults to 10. Are you increasing this? Are you doing anything unusual with multiple event loops or threads?

Note that Windows is not a fully-supported platform for Tornado; scalability is limited in comparison to linux or macos.

@hclivess
Copy link
Author

hclivess commented Feb 7, 2023

I have not reconfigured max_clients or any other settings, but I suspect that could be the issue. I have been implementing asynchronous calls for the first time so I am not sure if my usage is correct, although I have followed documentation where possible. You can see for yourself (the project is named after Tornado).

Here are the gather calls:
https://github.com/hclivess/nado/blob/main/compounder.py

Most frequent usage elsewhere:
https://github.com/hclivess/nado/blob/main/ops/block_ops.py

@bdarnell
Copy link
Member

bdarnell commented Feb 7, 2023

Your code looks fine to me; I'm not sure what might be going on.

It could be a bug in Tornado. For example, we have an unresolved bug that can cause things to hang in some cases with a combination of redirects and timeouts (#3064).

@hclivess
Copy link
Author

hclivess commented Feb 7, 2023

Thanks for checking it out. I have switched to aiohttp now. I noticed that while I was using AsyncHTTPClient , the number of objects would grow rapidly when monitoring them with muppy.get_objects() from pympler. With standard aiohttp approach, the issue seems to go away. For anyone else reading this:

Before/after: hclivess/nado@77fc179

I remember that when pressing CTRL-C, the application would spam me with "cannot schedule new futures after interpreter shutdown" for minutes so I assume that was coming from all the instances that were left in memory.

@bdarnell
Copy link
Member

bdarnell commented Feb 8, 2023

Cool, I'm glad you found something that works for you.

It does appear that you've found a bug that is allowing multiple AsyncHTTPClient objects to be created instead of reusing them (it's possible that the bug is in the asyncio-to-IOLoop mapping instead of in AsyncHTTPClient), so I'm going to retitle this issue and keep it open.

FWIW, I see from your diff that you're using Tornado 6.1. There was a change in version 6.2 that might at least make this problem easier to diagnose: #3157.

@bdarnell bdarnell changed the title "with" statement not allowed for AsyncHTTPClient Multiple AsyncHTTPClient objects created for one asyncio event loop Feb 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants