Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Websocket stream does not close when connection is abruptly cut #961

Closed
frederikaalund opened this issue Jul 8, 2016 · 17 comments
Closed
Labels

Comments

@frederikaalund
Copy link
Contributor

Long story short

Websocket stream does not close when connection is abruptly cut (e.g., pulling the network cable). This causes the connection to erroneously buffer the writes until the process runs out of memory.

Expected behaviour

When a connection is abruptly cut, I expect that either:

  • The WebSocketResponse's send_bytes and send_string methods raise an exception.
  • The async for msg in ws: pass loop returns.

Actual behaviour

The websocket stream is kept alive forever, never recognizing the connection failure.

Steps to reproduce

The following server program reproduces the error:

#!/usr/bin/env python
import aiohttp
import asyncio
from aiohttp import web

async def writer(ws):      
    try:                             
        print('opened')              
        while True:                       
            ws.send_bytes(bytes(100000)) 
            await asyncio.sleep(0.1)    
    except Exception as error:              
        print('closed:', type(error))

async def handler(request):              
    ws = web.WebSocketResponse()     
    await ws.prepare(request)        
    asyncio.ensure_future(writer(ws))
    async for msg in ws:             
        pass                     
    return ws                        

app = web.Application()                  
app.router.add_route('GET', '/', handler)
web.run_app(app)                                               

Pair it with the following HTML client:

<html>
    <script>
        ws = new WebSocket('ws://192.168.1.100:8080/');
        ws.onmessage = (message) => console.log(message);
        ws.onerror = (error) => console.error(error);
        ws.onclose = () => console.log('closed');
    </script>
</html>

Now, if you simply connect, you will start to see the messages in the console. The memory usage of the Python process is constant as expected. Refreshing the browser window (closing and reopening the connection) also works (the connection is logged as closed and opened on the server).

However, if you abruptly cut the connection (e.g., by removing the network cable or disabling the network interface) then the server does not register that the connection is closed. Not even within a reasonable timeout period (I had a process running for 15 mins). Instead, the writes are seemingly buffered until the Python process runs out of memory.

Your environment

Asyncio 0.21.6. Python 3.5.1

@dmarkey
Copy link

dmarkey commented Jul 16, 2016

I also see this.

@Insoleet
Copy link
Contributor

Same problem as #827

@asvetlov
Copy link
Member

There are two solutions for the problem:

  1. Expose ws.drain() coroutine. Technically ws has flow control deep inside, we just need to propagate it up to user level.
  2. Implement timeout. If no incoming events for timeout period -- connection should be terminated by server. Client should send ping messages periodically (websocket pings or custom ones, doesn't matter). Unfortunately modern browsers don't support native WS pings.

Maybe we need both solutions

@frederikaalund
Copy link
Contributor Author

Unfortunately modern browsers don't support native WS pings.

Seems to be working in Google Chrome (52.0.2743.116 m).

Ping/Ping Heartbeat

I tried implementing Insoleet's solution using ping/pong as a heartbeat.

It works somewhat. The heartbeat correctly detects that the connection is closed. However, even though await ws.close() is called, the async for msg in ws loop doesn't return immediately. 14 mins later, the async for msg in ws loop finally returns with some errors.

Here is the logs that I've gathered from the event. First, the confirmation that the heartbeat mechanism is closing the ws.

Sep 07 14:55:06 thunna-development-2 sensord:  [sensord.http_serve == MISSED 4 HBs: Closing

Then (14 mins later), an exception is logged. Seemingly, this exception originates from somewhere within asyncio itself.

Sep 07 15:09:19 thunna-development-2 sensord:  [asyncio] Fatal read error on socket transport
Sep 07 15:09:19 thunna-development-2 sensord:  handle_traceback: Handle created at (most recent call last):
Sep 07 15:09:19 thunna-development-2 sensord:    File "/opt/bin/sensord", line 2, in <module>
Sep 07 15:09:19 thunna-development-2 sensord:      import sensord.__main__
Sep 07 15:09:19 thunna-development-2 sensord:    File "<frozen importlib._bootstrap>", line 969, in _find_and_load
Sep 07 15:09:19 thunna-development-2 sensord:    File "<frozen importlib._bootstrap>", line 958, in _find_and_load_unlocked
Sep 07 15:09:19 thunna-development-2 sensord:    File "<frozen importlib._bootstrap>", line 673, in _load_unlocked
Sep 07 15:09:19 thunna-development-2 sensord:    File "<frozen importlib._bootstrap_external>", line 635, in exec_module
Sep 07 15:09:19 thunna-development-2 sensord:    File "<frozen importlib._bootstrap>", line 222, in _call_with_frames_removed
Sep 07 15:09:19 thunna-development-2 sensord:    File "/opt/lib/python3.5/site-packages/sensord/__main__.py", line 280, in <module>
Sep 07 15:09:19 thunna-development-2 sensord:      main()
Sep 07 15:09:19 thunna-development-2 sensord:    File "/opt/lib/python3.5/site-packages/sensord/__main__.py", line 259, in main
Sep 07 15:09:19 thunna-development-2 sensord:      loop.run_forever()
Sep 07 15:09:19 thunna-development-2 sensord:    File "/usr/lib/python3.5/asyncio/base_events.py", line 295, in run_forever
Sep 07 15:09:19 thunna-development-2 sensord:      self._run_once()
Sep 07 15:09:19 thunna-development-2 sensord:    File "/usr/lib/python3.5/asyncio/base_events.py", line 1246, in _run_once
Sep 07 15:09:19 thunna-development-2 sensord:      handle._run()
Sep 07 15:09:19 thunna-development-2 sensord:    File "/usr/lib/python3.5/asyncio/events.py", line 125, in _run
Sep 07 15:09:19 thunna-development-2 sensord:      self._callback(*self._args)
Sep 07 15:09:19 thunna-development-2 sensord:    File "/usr/lib/python3.5/asyncio/selector_events.py", line 232, in add_reader
Sep 07 15:09:19 thunna-development-2 sensord:      handle = events.Handle(callback, args, self)
Sep 07 15:09:19 thunna-development-2 sensord:  protocol: <RequestHandler GET:/samples/local connected>
Sep 07 15:09:19 thunna-development-2 sensord:  transport: <_SelectorSocketTransport fd=22 read=polling write=<idle, bufsize=0>>
Sep 07 15:09:19 thunna-development-2 sensord:  Traceback (most recent call last):
Sep 07 15:09:19 thunna-development-2 sensord:    File "/usr/lib/python3.5/asyncio/selector_events.py", line 664, in _read_ready
Sep 07 15:09:19 thunna-development-2 sensord:      data = self._sock.recv(self.max_size)
Sep 07 15:09:19 thunna-development-2 sensord:  OSError: [Errno 113] No route to host

Lastly, I get the following exception message from the try ... catch block that I have around the async for msg in ws loop:

Sep 07 15:09:19 thunna-development-2 sensord:  [sensord.http_serve [192.168.1.244:20107] Websocket "/samples/local" ASYNC_MSG_LOOP_EXC: 
Sep 07 15:09:19 thunna-development-2 sensord:  Traceback (most recent call last):
Sep 07 15:09:19 thunna-development-2 sensord:    File "/opt/lib/python3.5/site-packages/sensord/http_server/stream/websocket.py", line 86, in serve_until_closed
Sep 07 15:09:19 thunna-development-2 sensord:      async for msg in self.websocket:
Sep 07 15:09:19 thunna-development-2 sensord:    File "/usr/lib/python3.5/site-packages/aiohttp/web_ws.py", line 308, in __anext__
Sep 07 15:09:19 thunna-development-2 sensord:      msg = yield from self.receive()
Sep 07 15:09:19 thunna-development-2 sensord:    File "/usr/lib/python3.5/site-packages/aiohttp/web_ws.py", line 231, in receive
Sep 07 15:09:19 thunna-development-2 sensord:      msg = yield from self._reader.read()
Sep 07 15:09:19 thunna-development-2 sensord:    File "/usr/lib/python3.5/site-packages/aiohttp/streams.py", line 592, in read
Sep 07 15:09:19 thunna-development-2 sensord:      result = yield from super().read()
Sep 07 15:09:19 thunna-development-2 sensord:    File "/usr/lib/python3.5/site-packages/aiohttp/streams.py", line 447, in read
Sep 07 15:09:19 thunna-development-2 sensord:      yield from self._waiter
Sep 07 15:09:19 thunna-development-2 sensord:    File "/usr/lib/python3.5/asyncio/futures.py", line 358, in __iter__
Sep 07 15:09:19 thunna-development-2 sensord:      yield self  # This tells Task to wait for completion.
Sep 07 15:09:19 thunna-development-2 sensord:  concurrent.futures._base.CancelledError
Sep 07 15:09:19 thunna-development-2 sensord:  [sensord.http_serve [192.168.1.244:20107] Websocket "/samples/local" ************ loop over

Seemingly, some asyncio internals are acting up when the connection is abruptly cut which can't be handled in user code.

Python 3.5.1 and Aiohttp 0.22.5.

@asvetlov
Copy link
Member

asvetlov commented Sep 7, 2016

14 mins delay surprises me a lot.
Sorry, looks like I have no time to dig into the issue before 1.0 release.
But I definitely love to see it solved :)

@Insoleet
Copy link
Contributor

Insoleet commented Sep 7, 2016

Despite the heartbeat mechanism, I have some users of my application who complained that the websockets would freeze if they were in a bad wifi area network. I suppose that it is because ws.close() does not since a new network was opened and no exception would be thrown.

@frederikaalund : could you check if you reopen your connection before the exceptions are thrown if the ws ever close ?

Also, the problem with closing the websockets could be linked to the other problem I found : #754

Right now I'm forced to cancel websockets coroutines to close them when I stop my application.

@frederikaalund
Copy link
Contributor Author

14 mins delay surprises me a lot.

Me too! I just repeated the test. Same result:

16:01:19 ws is closed due to missed heartbeats.
16:16:07 asyncio "fatal read error on socket transport"

Approximately 15 mins delay this time.

Sorry, looks like I have no time to dig into the issue before 1.0 release.
But I definitely love to see it solved :)

It would be great to see a fix. Any approximative 1.0 release date? Like, is it one or two years from now?

@frederikaalund : could you check if you reopen your connection before the exceptions are thrown if the ws ever close ?

Do you mean to call await ws.prepare() just after await ws.close()?

@Insoleet
Copy link
Contributor

Insoleet commented Sep 7, 2016

No, I was thinking about the following test :

  • Open a websocket
  • Shutdown your interface
  • Wait 5 minutes until ws.close() is called, and enable your interface back
  • Does the websocket ever close ? Did it reconnect on its own ?

@asvetlov
Copy link
Member

asvetlov commented Sep 7, 2016

It would be great to see a fix. Any approximative 1.0 release date? Like, is it one or two years from now?

I hope to publish it in a few days actually. Just after fixing problems with client cookie jar.

Do you mean to call await ws.prepare() just after await ws.close()?
.prepare() can be called only once per response lifetime.

@frederikaalund
Copy link
Contributor Author

I hope to publish it in a few days actually. Just after fixing problems with client cookie jar.

Sounds great!

Does the websocket ever close ? Did it reconnect on its own ?

I tried but it wouldn't reconnect. The client (Google Chrome) treats the connection as closed. In any case, no more data is sent from the server after ws.close since the data-sending coroutine errors out at that point in time:

Sep 07 16:36:00 thunna-development-2 sensord:  [sensord.http_serv [192.168.1.244:20211] Websocket "/samples/local" Could not send data to websocket. Closing connection. Error was: websocket connection is closing
Sep 07 16:36:00 thunna-development-2 sensord:  Traceback (most recent call last):
Sep 07 16:36:00 thunna-development-2 sensord:    File "/opt/lib/python3.5/site-packages/sensord/http_server/stream/websocket.py", line 113, in _on_next
Sep 07 16:36:00 thunna-development-2 sensord:      self.websocket.send_str(data)
Sep 07 16:36:00 thunna-development-2 sensord:    File "/usr/lib/python3.5/site-packages/aiohttp/web_ws.py", line 142, in send_str
Sep 07 16:36:00 thunna-development-2 sensord:      raise RuntimeError('websocket connection is closing')
Sep 07 16:36:00 thunna-development-2 sensord:  RuntimeError: websocket connection is closing

That's the strangest part. ws.send_str correctly detects that the connection is closed but the async for msg in ws loop never returns.

@tahajahangir
Copy link
Contributor

tahajahangir commented Sep 28, 2016

@frederikaalund

That's the strangest part. ws.send_str correctly detects that the connection is closed but the async for msg in ws loop never returns.

That's how TCP works. If you send something, TCP sends the data and expects ack, and retries send if ack not received. And after TCP timeout (usually 2 minutes), the connection will be closed.

But if you are reading from a socket and there is no incoming data, there is nothing wrong, you should wait forever, or have a timeout at a higher level (like keep-alive timeout).

@frederikaalund
Copy link
Contributor Author

That's how TCP works. If you send something, TCP sends the data and expects ack, and retries send if ack not received. And after TCP timeout (usually 2 minutes), the connection will be closed.

That may explain the error.

or have a timeout at a higher level (like keep-alive timeout).

That's what I think aiohttp should have built-in.

@asvetlov
Copy link
Member

That's what I think aiohttp should have built-in.
Agreed.
Moreover it should be coupled with sending websocket pings.

@fafhrd91
Copy link
Member

fafhrd91 commented Feb 1, 2017

All send methods return drain corouting + heartbeat system, should be sufficient

@fafhrd91 fafhrd91 closed this as completed Feb 1, 2017
@frederikaalund
Copy link
Contributor Author

@fafhrd91 So has a hearbeat been built into AioHTTP? Excuse me for being a bit out of the loop.

@fafhrd91
Copy link
Member

fafhrd91 commented Feb 1, 2017

@frederikaalund yes, it is parameter for WebSocketResponse or wsconnect()
ping/pong is being used as heartbeat

@lock
Copy link

lock bot commented Oct 29, 2019

This thread has been automatically locked since there has not been
any recent activity after it was closed. Please open a new issue for
related bugs.

If you feel like there's important points made in this discussion,
please include those exceprts into that new issue.

@lock lock bot added the outdated label Oct 29, 2019
@lock lock bot locked as resolved and limited conversation to collaborators Oct 29, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

6 participants