Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TCP not closing properly after POST #3208

Closed
Swamp-Ig opened this issue Aug 21, 2018 · 20 comments · Fixed by home-assistant/core#16198
Closed

TCP not closing properly after POST #3208

Swamp-Ig opened this issue Aug 21, 2018 · 20 comments · Fixed by home-assistant/core#16198

Comments

@Swamp-Ig
Copy link

Long story short

I'm connecting to an IOT device (airconditioner) with a REST interface.

Posting commands using >=3.0 doesn't work. The post goes through and the status OK is returned, but there's some issue with the TCP connection.

Steps to reproduce

Within the wireshark packet capture there's two different TCP transactions with exactly the same aiohttp code running (also within the zip file). The short version of the code without imports etc looks like this:

device_ip = '10.0.0.107'

async def send_command_async(command: str, data: Any):
    body = {command : data}
    url = f"http://{device_ip}/{command}"
    async with ClientSession() as session:
        async with session.post(url, json=body) as response:
            response.raise_for_status()

loop: AbstractEventLoop = asyncio.get_event_loop()
loop.run_until_complete(loop.create_task(send_command_async('SystemMODE', 'heat')))

If you get aiohttp<3.0 you get the first packet capture, up until packet 10. Everything is happy.

With the latest version and any version above 3.0, all goes well until the ACK gets sent for the OK packet, then a whole spume of out of order and dup ack packets goes back and forth.

I can't actually see any difference in what gets sent or anything. It's possible it's the device that's the issue, I've emailed the guys that produce the device so they are on this issue too, but the low level TCP stuff is a bit beyond me.

The odd thing is that http GET works just fine.

Environment

  • Python 3.6
  • Windows 10
  • Latest aiohttp (but did a regression test back until <3.0 version)
  • The device is an iZone air conditioner
@webknjaz
Copy link
Member

I assume that the issue might be within HTTP parser in C from nodejs that we use.
You may try downgrading git submodule in vendor/http-parser folder to some version in past. Let's say some commit a year or two ago. This should make it clear whether my guess is related.

@Swamp-Ig
Copy link
Author

Swamp-Ig commented Aug 21, 2018

I just tried python 3.7... didn't help. Will do as you suggest

Edit:
Err.... I would do as you suggest, but I'll have to figure out how to do it. I did get as far as checking out aiohttp I've never really used python until a couple of weeks ago.

@webknjaz
Copy link
Member

Oh, just make sure that you do git clone with --recurse-submodules, then go to the folder I mentioned and git checkout to some old commit.
After that you're fine to run pip install -e . from within aiohttp repo folder under virtualenv (recommened).

Also, you could try running your aiohttp app with AIOHTTP_NO_EXTENSIONS=1 env var, this should trigger falling back to the pure-python http parser implementation.
Smth like AIOHTTP_NO_EXTENSIONS=1 python your_app.py

@Swamp-Ig
Copy link
Author

Tried AIOHTTP_NO_EXTENSIONS=1 , that had the same issue. Is that enough to prove it?
Doing the sub-module thing now...

@webknjaz
Copy link
Member

Yeah, it should be enough. You could share asciinema recording of how you run it if you're unsure that you do it right :)

@webknjaz
Copy link
Member

Hm.. I think you might need to pip install with this env var, it affects install time, not runtime I guess.

This'll do:

AIOHTTP_NO_EXTENSIONS=1 pip install --no-binary :all: aiohttp

(it affects only source dists as wheels come with extensions pre-compiled)

@Swamp-Ig
Copy link
Author

Swamp-Ig commented Aug 21, 2018

Ahh... yes that worked, the bug doesn't show up

so.... now what?

Also I did try compiling the package with pip install -e . but it wouldn't compile. I think I'd need to set up some environment variables or something so the compiler will work. Getting quite outside the scope of what I can comfortably contribute sadly.

Is there some way to force aiohttp to not use the extensions from my package? Otherwise I'll have to just use normal requests and probably do it in a thread so it doesn't block waiting on IO, which is disappointing.

@webknjaz
Copy link
Member

Well, using that pip command I've shown before is an official way to do so. What's the problem with it?

Also, after re-reading your issue description I realized that the difference you're seeing might be HTTP pipelining. Could you confirm whether your tests use HTTP/1.0 or HTTP/1.1 with or without pipelining?
Sorry, I don't have the capacity to analyze your dumps myself right now.

@webknjaz
Copy link
Member

May I also ask you to extract and post plain text HTTP request-responses?

@Swamp-Ig
Copy link
Author

Swamp-Ig commented Aug 21, 2018

The HTTP is exactly the same / normal:

POST /SystemMODE HTTP/1.1
Accept: */*
Accept-Encoding: gzip, deflate
Host: 10.0.0.107
User-Agent: Python/3.6 aiohttp/2.3.10
Content-Length: 22
Content-Type: application/json

{"SystemMODE": "heat"}
POST /SystemMODE HTTP/1.1
Accept: */*
Accept-Encoding: gzip, deflate
Host: 10.0.0.107
User-Agent: Python/3.6 aiohttp/3.3.2
Content-Length: 22
Content-Type: application/json

{"SystemMODE": "heat"}

It's the TCP layer that doesn't seem to work properly.

Here's a screen cap of the wireshark capture. The first packet exchange is the older version of aiohttp, which works fine (up to the first blue packet, which is an unrelated UDP packet)

The second bit is with the latest aiohttp. It seems not to send the [FIN,ACK] packet, and that's where the trouble happens. I think FIN means it's closing the TCP connection, so maybe it's as simple as just calling close on the connection for a post?

Maybe the sequence number for packet 20 actually is the problem? In the first exchange the ack is for seq 206, while in the second it's for 205. I don't really know not a TCP expert unfortunately!

screen cap

Eventually the TCP packet exchange ends:

packet capture

I can't really do an atypical install of aiohttp since this is a library that is then used to integrate with home assistant, so I can't really commit buggy code out into the wild. Maybe I can do something with TCPConnector or something? I think I'll just not use aiohttp for now.

For debugging purposes, I can port-forward the port through my router if you really need. Maybe try it with a post to something else first though.

@webknjaz
Copy link
Member

Okay, I can see why you think that it's a TCP level problem. It looks like you're right.

Just to record my observations:

  1. for some reason, the lengths of POST request TCP packets on the screenshot are different (259 and 76), while in the HTTP-level snippets you posted they are both 22.
  2. ACK (from client to server) from line 20 has been retransmitted 3 more times (lines 22, 22 and 26) and have been marked as duplicates
  3. in the second case, first server's FIN+ACK got lost on the network and then got no response from the client upon retransmits and have been marked out of order (sequence number 96 < 205)
  4. in the first case, server tried to keep TCP connection open with keep-alive

I think FIN means it's closing the TCP connection

Almost: https://stackoverflow.com/a/13050021/595220. It's I won't send you anything but will consume new data from you. So after FIN socket in a client is closed for writing, but still open for reading.

Maybe the sequence number for packet 20 actually is the problem?

It looks related. In the first case, seq matches the ack from tcp win update, but in the second it doesn't (183 != 205).

For debugging purposes, I can port-forward the port through my router if you really need. Maybe try it with a post to something else first though.

I'd like to invite you to get back to bisecting what went wrong with HTTP-parser.
To compile it, you'll need to have to pip install -r requirements/dev.txt first and have gcc available on your machine.
See http://docs.aiohttp.org/en/stable/contributing.html#preconditions-for-running-aiohttp-test-suite for more info.

Maybe I can do something with TCPConnector or something?

ClientSession already closes it: https://github.com/aio-libs/aiohttp/blob/master/aiohttp/client.py#L828.
So let's get back to tracking down the source of bug.

@Swamp-Ig
Copy link
Author

ClientSession already closes it: https://github.com/aio-libs/aiohttp/blob/master/aiohttp/client.py#L828.
So let's get back to tracking down the source of bug.

It might be closing it, but it's not actually closing the TCP connection. 10.0.0.66 is the local host here, in the first bit before the blue line FIN gets sent, but in the one where there's a problem it doesn't.

I'll see what I can do to further figure it out, but it seems to me like whatever aiohttp is doing isn't working.

@webknjaz
Copy link
Member

It's probably a bug in http-parser, not aiohttp's own code. That's why I want to find out the problem there by bisecting it.

@Swamp-Ig
Copy link
Author

ACK (from client to server) from line 20 has been retransmitted 3 more times (lines 22, 22 and 26) and have been marked as duplicates

Actually this goes on for a heap of packets, not just those three. If you see the second screencap it goes through to packet 4886

in the second case, first server's FIN+ACK got lost on the network and then got no response from the client upon retransmits and have been marked out of order (sequence number 96 < 205)

First server's FIN didn't get lost, it just didn't get sent.

It's probably a bug in http-parser, not aiohttp's own code. That's why I want to find out the problem there by bisecting it.

Yes, well I can agree with that. Is http-parser going to be amenable to being fixed however?

@webknjaz
Copy link
Member

Is http-parser going to be amenable to being fixed however?

It's here: https://github.com/nodejs/http-parser. And there's actually 9 new commits since the last sync, so we'll need to grab updates as well. cc @asvetlov

@Swamp-Ig could you please try upgrading the submodule with http-parser and do the test after rebuilding aiohttp?

@webknjaz
Copy link
Member

I've upgraded http-parser in master (#3209)

@Swamp-Ig
Copy link
Author

OK... interesting.

I compiled and installed under python 3.6.6 in windows subsystem for unix with debian. It did take a bit of messing about because I had to install python 3.6.6. This seemed to fix the problem.

However rolling back to aiohttp v3.3.2 did have the problem. I don't know what's changed, but whatever it is it's fixed it.

So I guess you can mark this as fixed on the next release. Will need to get home assistant to update to the new version once it's out of alpha.

@asvetlov
Copy link
Member

Well. Let's close it

@webknjaz
Copy link
Member

@asvetlov This all lead me to one idea 💡.
I think we need to ship some way for dumping diagnostic/debug data. Like current app config details and whether it uses extensions or any other speed up.

@lock
Copy link

lock bot commented Oct 28, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a [new issue] for related bugs.
If you feel like there's important points made in this discussion, please include those exceprts into that [new issue].
[new issue]: https://github.com/aio-libs/aiohttp/issues/new

@lock lock bot added the outdated label Oct 28, 2019
@lock lock bot locked as resolved and limited conversation to collaborators Oct 28, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants