Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

web_ws - Can't close tcp socket when receiving a close message from client. #8184

Closed
1 task done
spikefishjohn opened this issue Feb 26, 2024 · 18 comments · Fixed by #8200
Closed
1 task done

web_ws - Can't close tcp socket when receiving a close message from client. #8184

spikefishjohn opened this issue Feb 26, 2024 · 18 comments · Fixed by #8200
Labels

Comments

@spikefishjohn
Copy link

spikefishjohn commented Feb 26, 2024

Describe the bug

I'm working on a GNS3/gns3-server#2320 for GNS3 that I think PR7978 lines up with. The GNS3 issue is that when a web socket client sends a close message the server never closes the tcp socket. I tried the patch above but it doesn't address the issue.

High level GNS3 is calling ws.receive() at which point aiohttp receive() the close message.

I believe the problem is because ws.receive is setting self._closing = True when it reiceves a close message. This then causes self.close() to return here which prevents if msg.type == WSMsgType.CLOSE: from being reached.

I'm almost thinking this should be removed but i'm not sure what the intent of that is so i'm unsure if that is the proper fix.

To Reproduce

I don't have a great way to reproduce this. I'm currently just using a fully installed GNS3 instance. I can work on making a reproduction now that I understand the issue.

Expected behavior

aiohttp.web_ws should close the socket when it receives a close message.

Logs/tracebacks

I'll add these later in the week. I wasn't prepared for this.

Python Version

$ python --version
Python 3.10.12

aiohttp Version

$ python -m pip show aiohttp
(gns3-server) root@ubuntu:~# source /usr/share/gns3/gns3-server/bin/activate
(gns3-server) root@ubuntu:~# python3 -m pip show aiohttp
Name: aiohttp
Version: 3.9.1
Summary: Async http client/server framework (asyncio)
Home-page: https://github.com/aio-libs/aiohttp
Author:
Author-email:
License: Apache 2
Location: /usr/share/gns3/gns3-server/lib/python3.10/site-packages
Requires: aiosignal, async-timeout, attrs, frozenlist, multidict, yarl
Required-by: aiohttp-cors, gns3-server
(gns3-server) root@ubuntu:~#

multidict Version

$ python -m pip show multidict
(gns3-server) root@ubuntu:~# python3 -m pip show multidict
Name: multidict
Version: 6.0.5
Summary: multidict implementation
Home-page: https://github.com/aio-libs/multidict
Author: Andrew Svetlov
Author-email: [email protected]
License: Apache 2
Location: /usr/share/gns3/gns3-server/lib/python3.10/site-packages
Requires:
Required-by: aiohttp, yarl
(gns3-server) root@ubuntu:~#

yarl Version

$ python -m pip show yarl
(gns3-server) root@ubuntu:~# python3 -m pip show yarl
Name: yarl
Version: 1.9.4
Summary: Yet another URL library
Home-page: https://github.com/aio-libs/yarl
Author: Andrew Svetlov
Author-email: [email protected]
License: Apache-2.0
Location: /usr/share/gns3/gns3-server/lib/python3.10/site-packages
Requires: idna, multidict
Required-by: aiohttp
(gns3-server) root@ubuntu:~#
### OS

(gns3-server) root@ubuntu:~# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 22.04.3 LTS
Release:        22.04
Codename:       jammy
(gns3-server) root@ubuntu:~#

Related component

Server

Additional context

No response

Code of Conduct

  • I agree to follow the aio-libs Code of Conduct

EDIT: Update PR request in description.

@spikefishjohn
Copy link
Author

Not sure what I did to strike through the OS but that is all correct.

@Dreamsorcerer
Copy link
Member

It does look a little odd, and seems a little obvious which makes me wonder why it hasn't been noticed before.

Do you think you could write a test in a PR to reproduce it? Then we could also try your suggestion of removing that check and seeing if any other tests break, which would be a good indication if it's the wrong change to make.

@bdraco
Copy link
Member

bdraco commented Feb 26, 2024

I think the code is expecting that the client will close the connection, but we can't always rely on that even if they say they're going to close it, they may not.

@spikefishjohn
Copy link
Author

spikefishjohn commented Feb 26, 2024

yeah, I forked aiohttp last night. I'll work on a small replication. I did try removing the call. This results in the next read request in close() failing with timeout error and then the socket does close but with a .. uh.. 1006 message I think.

I also started looking out how to pass message to close() so I could add a condition for the read statement. Didn't get very far but it was kind of late.

@Dreamsorcerer
Copy link
Member

Dreamsorcerer commented Feb 26, 2024

https://datatracker.ietf.org/doc/html/rfc6455#section-7

Once an endpoint has both sent and received a Close control frame, that endpoint SHOULD Close the WebSocket Connection

So, this is what it is trying to achieve when close is initiated by us.

As such, when a server is instructed to Close the WebSocket Connection it SHOULD initiate a TCP Close immediately, and when a client is instructed to do the same, it SHOULD wait for a TCP Close from the server.

Except as indicated above or as specified by the application layer (e.g., a script using the WebSocket API), clients SHOULD NOT close the connection.

I think we just need to tweak that code so that the server does not wait to receive a close code (as it has already received one), but it should continue to close the transport. i.e. Rather than removing that check we want to add self._set_code_close_transport(...) in there (but, we might need to avoid calling it twice?).

@spikefishjohn
Copy link
Author

so something like this instead of deleting? I looked through the code and it seems like _close_code is always set but i'm not sure if its safe to assume it is? Default is None.

        if self._closing:
            self._set_code_close_transport(self._close_code)
            return True

@Dreamsorcerer
Copy link
Member

but i'm not sure if its safe to assume it is?

Not sure, I'd have to look more closely. If we get the test up first, then we can see how it works.

@spikefishjohn
Copy link
Author

Boiling this down has been harder then I expected. So far I'm not recreating the issue. More research and foul language will be required.

@bdraco
Copy link
Member

bdraco commented Mar 1, 2024

I'm pretty convinced there is a bug here where we don't close the transport if the client holds it open forever

@bdraco
Copy link
Member

bdraco commented Mar 1, 2024

@spikefishjohn Can you give #8200 a shot? I'm running it on my production HA systems without any unexpected side effects

I'll come up with a test for it if it fixes your issue

bdraco added a commit to bdraco/aiohttp that referenced this issue Mar 1, 2024
@spikefishjohn
Copy link
Author

@bdraco yeah i'll give it a test. I've been trying to reproduce the issue with a smaller code base but haven't been able to so far which has been pretty frustrating.

I'll add this in today and see if it fixes the issue. If not i'll add a pdb trace of the accept to show what is happening if that helps.

@spikefishjohn
Copy link
Author

spikefishjohn commented Mar 2, 2024

Ok that seems to fix the issue. Here is the client I'm using to talk to gns3.

import asyncio
import base64
import json
import websockets
import logging

logger = logging.getLogger('websockets')
logger.setLevel(logging.DEBUG)
handler = logging.StreamHandler()
formatter = logging.Formatter('%(asctime)s - %(levelname)s - %(message)s')
handler.setFormatter(formatter)
logger.addHandler(handler)
log = logging.getLogger(__name__)


# GNS3 WebSocket server violates RFC 6455 so we have to be active closer
# Lets give Websockets a chance to get data.
WS_CLOSE_TIMEOUT = 10

RECONNECT_TIMEOUT = 1.618

CONTROLLER_WS_API = '/v2/notifications/ws'
COMPUTE_WS        = '/v2/notifications/ws'
SERVER            = '127.0.0.1:3080'
USER              = 'XXX'
PASS              = 'XXX'
CREDS             = f'{USER}:{PASS}'
ENCODED_CREDS     = base64.b64encode(CREDS.encode()).decode()
CONTROLLER_URI    = f'ws://{SERVER}{CONTROLLER_WS_API}'
COMPUTE_URI       = f'ws://{SERVER}{COMPUTE_WS}'


async def main() -> None:
    async with asyncio.TaskGroup() as tasks:
        tasks.create_task(websocket_logger(CONTROLLER_URI))

async def websocket_logger(endpoint: str) -> None:
    headers = {
        'Authorization': f'Basic {ENCODED_CREDS}'
    }
    try:
        async with websockets.connect(endpoint, close_timeout=WS_CLOSE_TIMEOUT, extra_headers=headers) as websocket:
            print("Call close")
            await websocket.close()
            print("close complete")
    except ConnectionRefusedError:
        log.info(f'Connection to {endpoint!r} refused.')
        await asyncio.sleep(RECONNECT_TIMEOUT)

if __name__ == '__main__':
    asyncio.run(main())

This is what the client now reports.

john@compute01:~$ python3.11 ws-client.py
2024-03-02 11:40:39,775 - DEBUG - = connection is CONNECTING
2024-03-02 11:40:39,775 - DEBUG - > GET /v2/notifications/ws HTTP/1.1
2024-03-02 11:40:39,775 - DEBUG - > Host: 127.0.0.1:3080
2024-03-02 11:40:39,775 - DEBUG - > Upgrade: websocket
2024-03-02 11:40:39,775 - DEBUG - > Connection: Upgrade
2024-03-02 11:40:39,775 - DEBUG - > Sec-WebSocket-Key: XXX
2024-03-02 11:40:39,775 - DEBUG - > Sec-WebSocket-Version: 13
2024-03-02 11:40:39,775 - DEBUG - > Sec-WebSocket-Extensions: permessage-deflate; client_max_window_bits
2024-03-02 11:40:39,776 - DEBUG - > Authorization: Basic XXX
2024-03-02 11:40:39,776 - DEBUG - > User-Agent: Python/3.11 websockets/12.0
2024-03-02 11:40:39,779 - DEBUG - < HTTP/1.1 101 Switching Protocols
2024-03-02 11:40:39,779 - DEBUG - < Upgrade: websocket
2024-03-02 11:40:39,779 - DEBUG - < Connection: upgrade
2024-03-02 11:40:39,779 - DEBUG - < Sec-WebSocket-Accept: XXX
2024-03-02 11:40:39,779 - DEBUG - < Sec-WebSocket-Extensions: permessage-deflate
2024-03-02 11:40:39,779 - DEBUG - < Date: Sat, 02 Mar 2024 16:40:39 GMT
2024-03-02 11:40:39,779 - DEBUG - < Server: Python/3.10 aiohttp/3.9.3
2024-03-02 11:40:39,779 - DEBUG - = connection is OPEN
Call close
2024-03-02 11:40:39,779 - DEBUG - = connection is CLOSING
2024-03-02 11:40:39,779 - DEBUG - > CLOSE 1000 (OK) [2 bytes]
2024-03-02 11:40:39,780 - DEBUG - < TEXT '{"action": "ping", "event": {"cpu_usage_percent...y_usage_percent": 3.4}}' [84 bytes]
2024-03-02 11:40:39,781 - DEBUG - < CLOSE 1000 (OK) [2 bytes]
2024-03-02 11:40:39,781 - DEBUG - = connection is CLOSED
close complete
john@compute01:~$

I'll pass this on the original poster of the bug and have them test as well.

@bdraco
Copy link
Member

bdraco commented Mar 2, 2024

I'll pass this on the original poster of the bug and have them test as well.

Thanks. Please keep us updated.

@spikefishjohn
Copy link
Author

FYI the bug reporter indicated they won't be able to test for a week or so.

@bdraco
Copy link
Member

bdraco commented Mar 11, 2024

@spikefishjohn Was the reporter able to test the linked PR? Thanks

@spikefishjohn
Copy link
Author

@bdraco I'm asking for an update.

@bdraco
Copy link
Member

bdraco commented Mar 14, 2024

Thanks!

@spikefishjohn
Copy link
Author

@bdraco The original poster of the GNS3 issues has indicated they will not be able to test this and asked that the original GNS3 bug be closed. Your patch fixed the client I made above for GNS3. I think that is as much as a reply as this will get.

That being said if you think there is something else I can do to help by all means let me know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants