Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not waiting for connection_lost() when transport closed #1925

Closed
omnicognate opened this issue May 26, 2017 · 37 comments
Closed

Not waiting for connection_lost() when transport closed #1925

omnicognate opened this issue May 26, 2017 · 37 comments
Milestone

Comments

@omnicognate
Copy link

Long story short

aiohttp calls close() on the transport and then immediately discards it without waiting for connection_lost(). This is a problem for SSL transports, which have to shut down asynchronously, and it appears aiohttp only "works" with the standard library's SSL transport because of a bug in that transport.

Expected behaviour

aiohttp waiting for connection_lost() before discarding transport

Actual behaviour

It not doing so

Steps to reproduce

import aiohttp
import asyncio

async def main():
    conn = aiohttp.TCPConnector(verify_ssl=False)
    async with aiohttp.ClientSession(connector=conn) as session:
        async with session.get('https://httpbin.org/') as resp:
            resp.raise_for_status()

loop = asyncio.get_event_loop()
try:
    loop.run_until_complete(main())
finally:
    loop.close()

Run this program and breakpoint connection_made() and connection_lost() in client_proto.py. The former is called, but not the latter.

The program appears to run successfully, but to convince you that something is severely amiss, add a __del__ method to the class _SelectorSocketTransport in asyncio's selector_events.py and breakpoint it and that transport's close() method. You'll see that close() is never called and __del__ gets called during shutdown, after the loop has been closed and the module finalised, because it still has a reader registered and is therefore leaked (while still waiting on I/O) in a reference cycle with the event loop.

If I'm understanding it right, the overall behaviour appears to be due to two bugs, one in asyncio and one in aiohttp:

  1. When close() is called on the SSLProtocolTransport in asyncio's ssl_proto.py it initiates an asynchronous shutdown process, which should ultimately result in it calling connection_lost(). However, this doesn't progress for reasons I haven't fully figured out, so connection_lost() is never called. The shutdown process is clearly broken because it never results in close() being called on the underlying transport, which is leaked as shown above.
  2. aiohttp doesn't wait for connection_lost(), so the hang that one would expect as a result of this never happens. By luck, the broken shutdown sequence never actually causes any errors.

I'm actually more interested in the latter, the aiohttp behaviour, because I've written my own SSL transport (based on pyopenssl), which does a correct asynchronous shutdown sequence. When aiohttp discards my transport without waiting for connection_lost(), it causes errors, which I don't think are my fault.

Your environment

Tested on Windows and Linux with python 3.5.1 and 3.5.2 and aiohttp 2.0.7 and 2.1.0.

@omnicognate
Copy link
Author

I think the waiting for the transport shutdown would have to happen within the aexit coroutine in ClientSession, which currently just calls an ordinary function to perform the close. I don't see any other stage in program execution where the wait could be done.

@fafhrd91
Copy link
Member

What is the reason to wait until close procedure completes in client code?

BaseConnection accepts enable_cleanup_closed parameter, it adds special wait queue
for ssl connections https://github.com/aio-libs/aiohttp/blob/master/aiohttp/connector.py#L151
but that is work around buggy server which never complete close procedure.

@asvetlov
Copy link
Member

Well, graceful closing all connections before exit from async with ClientSession() might be good idea.
But I see problems with implementation until ClientSession.close() is actually a function returning a future object.
This behavior should be supported until we drop Python 3.4 along with with ClientSession() etc.
So the issue implementation should be postponed.

@fafhrd91
Copy link
Member

connection_lost has nothing to do with client session. Also why should we care?

@asvetlov adding graicful close support to session is bad idea.

@asvetlov
Copy link
Member

Maybe graceful close is bad wording but I want to finish client.close() only after getting connection_lost notifications from all opened connections.

Otherwise people should put await sleep(0) but even this doesn't help for SSL connections -- they need more than one sleep for shut down.

@fafhrd91
Copy link
Member

So what is so important in close? Is session is alive we can wait for connection_close, but if developer dropped session why aiohttp should wait for connection_lost? That is expected behavior otherwise will get same as with old release method implemention. What explicit behavior then add something like release_all_connections. Another question how often you drop session in normal implication? Once per process run? Sure it makes live of transport developers harder, but who cares?

@omnicognate
Copy link
Author

Another problem with implementing this at the moment is that the default SSL transport never calls connection_lost() so you'll get a hang. I might get around to submitting an issue to python for that at some point, but I've had to move on from asyncio entirely as a result of this and numerous other bugs, and the fact that there appears to be little clarity on how to correctly implement and use the interfaces.

As to why proper shutdown behaviour is required, it's because without it it is not safe for the client to close the event loop after the session is closed, and the client has no reliable way to wait for a time at which it is safe to discard the loop.

SSL has a shutdown sequence that requires multiple steps, which in asyncio are naturally implemented asynchronously. Calling close() initiates that sequence, but the indication that it has finished is connection_lost(). If you don't wait for that sequence to complete before the aiohttp session's close has finished then the program will proceed without the SSL shutdown sequence having been completed. Worse, the program will proceed while the shutdown is ongoing, with tasks pending on the event loop. There is then no way for the client to wait for those pending tasks to be complete before destroying the event loop, and destroying the event loop without them completing will cause "Task was destroyed but it is pending" warnings and a host of other problems. As @asvetlov comments, even doing an await sleep(0) isn't sufficient in the client as there can be an arbitrary number of such awaits required before the queue is empty (2 in the particular case of SSL) and there is no way to explicitly check for the queue being empty.

The only reason that these problems are not currently occurring is that asyncio's default SSL transport isn't proceeding with its multi-step SSL shutdown sequence, so there are no tasks pending on the queue. This is a bug in that transport, and means that it never calls connection_lost().

If I'm understanding it correctly, this is a very nasty 2-way bug-bug dependency between aiohttp and asyncio. Fixing either of the two bugs will result in breakage: if you fix asyncio's SSL transport to do a correct SSL shutdown sequence without fixing aiohttp to wait for it to complete then existing clients who discard the event loop immediately after the aiohttp session is closed will be destroying the event loop with tasks pending; if you make aiohttp wait for the transport to do its shutdown without fixing the SSL transport to actually proceed with its shutdown, you will cause existing clients to hang.

It's a bit of a Gordian knot, if I'm understanding it correctly.

@omnicognate
Copy link
Author

Oops, I didn't mean to close the issue!

@omnicognate omnicognate reopened this Jun 27, 2017
@asvetlov asvetlov added this to the 3.0 milestone Jun 27, 2017
@fafhrd91
Copy link
Member

I dont think we should do anything until asyncio get fixed or changes. I still don't see why we should care about transports if developer dropped session.

@asvetlov asvetlov changed the title Not waiting for connection_lost() when transport closed [on hold] Not waiting for connection_lost() when transport closed Jun 27, 2017
@asvetlov
Copy link
Member

Ok, let just put the issue on hold until async fixes SSL transport.
But please don't close -- I want to keep tracking.

@fafhrd91
Copy link
Member

I still don't see reason why aiohttp needs to wait for connection_close call. lets just create issue in cpython repo and make reference.

@asvetlov
Copy link
Member

At least because without it user will get a warning about non-closed resources in debug mode IIRC.

@asvetlov
Copy link
Member

Or he need put a sleep after closing session but before program finishing.

@fafhrd91
Copy link
Member

this will complicates logic, and reason only debug message. doesn't sound like a good win for me.

@omnicognate
Copy link
Author

omnicognate commented Jun 28, 2017

See https://docs.python.org/3/library/asyncio-dev.html#pending-task-destroyed

The warnings are there for a reason. Closing the queue with tasks pending means that those tasks will never run. The implications of that depend on exactly what the tasks were supposed to do, but could include leaks, incorrect protocol behaviour, crashes, anything really. I only listed the warnings because they're a specific, predictable result, but the warnings are there to alert you to the less predictable and more serious potential consequences.

If you don't think the warnings are serious enough to warrant any effort to avoid them you could raise an issue suggesting they be removed, but I doubt it would get very far.

@omnicognate
Copy link
Author

omnicognate commented Jun 28, 2017

Oh, just one detail: These aren't "debug messages" (with the implication that they are a detail that can be ignored in production). They are warnings.

@fafhrd91
Copy link
Member

But close is not a task, it just bunch of callbacks, it is transpor's responsibility to execute all of them

@smurfix
Copy link

smurfix commented Jan 19, 2018

This boils down to programming contracts. When I use async with foo() as bar: I expect foo()'s _aenter__() code to set up the "bar" object so that I can use it, and when the block goes out of scope I expect its __aexit__(…) to return after everything is closed so that I can do whatever I need to do next – including shutting down the program – and without requiring some arcane asyncio.sleep() call to pass the time for connection_lost() and similar callbacks to run.

If that's not possible due to the way asyncio is designed, fine, document that – and urge people to ultimately find a better way of handling async code. Like, well, https://github.com/python-trio/trio for example.

NB: is there an actual issue which this bug is on hold for?

@asvetlov asvetlov modified the milestones: 3.0, 3.1 Feb 9, 2018
@asvetlov asvetlov modified the milestones: 3.1, 3.2 Mar 22, 2018
@asvetlov asvetlov modified the milestones: 3.2, 3.3 May 7, 2018
vmarkovtsev added a commit to athenianco/athenian-api that referenced this issue Jun 5, 2020
@timburke
Copy link

timburke commented Jun 9, 2020

Which are you using @timburke?

@vmarkovtsev A combination of pylint and mypy in SublimeLinter. I'm 95% sure this one was flagged by pylint...

@leszekhanusz
Copy link

leszekhanusz commented Oct 14, 2020

@timburke

I noticed that with your function, if we only have one connection which is not using ssl,
then the event is never set.
You need to check if the transports are 0 at the end like this:

def create_aiohttp_closed_event(session) -> asyncio.Event:                                                              
    """Work around aiohttp issue that doesn't properly close transports on exit.                                        
                                                                                                                        
    See https://github.com/aio-libs/aiohttp/issues/1925#issuecomment-639080209                                          
                                                                                                                        
    Returns:                                                                                                            
       An event that will be set once all transports have been properly closed.                                         
    """                                                                                                                 
                                                                                                                        
    transports = 0                                                                                                      
    all_is_lost = asyncio.Event()                                                                                       
                                                                                                                        
    def connection_lost(exc, orig_lost):                                                                                
        nonlocal transports                                                                                             
                                                                                                                        
        try:                                                                                                            
            orig_lost(exc)                                                                                              
        finally:                                                                                                        
            transports -= 1                                                                                             
            if transports == 0:                                                                                         
                all_is_lost.set()                                                                                       
                                                                                                                        
    def eof_received(orig_eof_received):                                                                                
        try:                                                                                                            
            orig_eof_received()                                                                                         
        except AttributeError:                                                                                          
            # It may happen that eof_received() is called after                                                         
            # _app_protocol and _transport are set to None.                                                             
            pass                                                                                                        
                                                                                                                        
    for conn in session.connector._conns.values():                                                                      
        for handler, _ in conn:                                                                                         
            proto = getattr(handler.transport, "_ssl_protocol", None)                                                   
            if proto is None:                                                                                           
                continue                                                                                                
                                                                                                                        
            transports += 1                                                                                             
            orig_lost = proto.connection_lost                                                                           
            orig_eof_received = proto.eof_received                                                                      
                                                                                                                        
            proto.connection_lost = functools.partial(                                                                  
                connection_lost, orig_lost=orig_lost                                                                    
            )                                                                                                           
            proto.eof_received = functools.partial(                                                                     
                eof_received, orig_eof_received=orig_eof_received                                                       
            )                                                                                                           
                                                                                                                        
    if transports == 0:                                                                                                 
        all_is_lost.set()                                                                                               
                                                                                                                        
    return all_is_lost

this function can be used like this:

closed_event = create_aiohttp_closed_event(session)                                               
await session.close()                                                                                  
await closed_event.wait()

@timburke
Copy link

You need to check if the transports are 0 at the end like this:

Agreed, we had to make that change internally as well and I forgot to update the comment. Thanks @leszekhanusz for posting it.

@asvetlov
Copy link
Member

Please check master

@leszekhanusz
Copy link

So to be clear, is this bug fixed in the release 3.7.0 and the workaround no longer necessary ?

@asvetlov
Copy link
Member

It will be fixed in aiohttp 4.0.0 which is not released yet

@the-vindicar
Copy link

Unfortunately, as of writing of this comment, most old workarounds for this issue had ceased to function.
I'm on Windows 10, using Python 3.10.4 and asyncio 3.8.4.

  • The create_aiohttp_closed_event() code by @leszekhanusz does not improve the situation, "event loop is closed" warning is still triggered.
  • await asyncio.sleep(0.25) does not help, no matter what delay I use (tried up to 5 seconds).
  • oneliner with asyncio.get_event_loop() relies on deprecated behaviour of it automatically creating a new loop if there isn't one.

As such, the only workaround that works fine for me (that is, produces no warnings and exits the program correctly) is a more verbose variant of get_event_loop():

loop = asyncio.get_event_loop_policy().get_event_loop()
loop.run_until_complete(main())

I suggest that it's added to the corresponding documentation page that is linking this very issue.

@jacob-coalition
Copy link

jacob-coalition commented Apr 1, 2024

Just to chime in again with another potential work around: For our usecase, we are running solely the aiohttp client within a docker microservice that has some custom tracing logic was keeping our TCP Connections open, so the fix for us was to force TCP connections to close:

aiohttp.ClientSession(connector=aiohttp.TCPConnector(force_close=True, enable_cleanup_closed=True))

and then all of our Unclosed Transport and Unclosed Socket connections went away.

Just sharing as another potential work around, though this issue may have been specific to our tracing configuration and not help others but wanted to share.

zavocc added a commit to zavocc/JakeyBot that referenced this issue Sep 25, 2024
aiohttp's ClientSession is now at the top level main.py which uses discord.Bot's running loop attribute
(still causes ssl.c errors, time to switch prod distros?)

Shutdown command: also close aiohttp Clientsession but its still buggy: aio-libs/aiohttp#1925

Other fixes
zavocc added a commit to zavocc/JakeyBot that referenced this issue Sep 25, 2024
aiohttp's ClientSession is now at the top level main.py which uses discord.Bot's running loop attribute
(still causes ssl.c errors, time to switch prod distros?)

Shutdown command: also close aiohttp Clientsession but its still buggy: see aio-libs/aiohttp/issues/1925

Other fixes
stefanor added a commit to stefanor/truststore that referenced this issue Oct 26, 2024
The sleep wasn't reliably long-enough, but requesting the client to
close the connection reliably avoids the issue.

As suggested here:
aio-libs/aiohttp#1925 (comment)
sethmlarson pushed a commit to sethmlarson/truststore that referenced this issue Oct 26, 2024
The sleep wasn't reliably long-enough, but requesting the client to
close the connection reliably avoids the issue.

As suggested here:
aio-libs/aiohttp#1925 (comment)
@BabakAmini
Copy link

In my case, this issue occurs exclusively when I use uvloop. The current workaround involves either adding a new ignore filter to warnings or incorporating a connector as shown below when creating the session:

connector = aiohttp.TCPConnector(force_close=True, enable_cleanup_closed=True)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests