Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Caddy prematurely closes connections to reverse proxies when using HTTP/1 #5431

Closed
supriyo-biswas opened this issue Mar 11, 2023 · 19 comments · Fixed by #5654
Closed

Caddy prematurely closes connections to reverse proxies when using HTTP/1 #5431

supriyo-biswas opened this issue Mar 11, 2023 · 19 comments · Fixed by #5654
Labels
needs info 📭 Requires more information upstream ⬆️ Relates to some dependency of this project
Milestone

Comments

@supriyo-biswas
Copy link

supriyo-biswas commented Mar 11, 2023

Details

  • Caddy version: 2.6.4 (0db29e2 (14 Feb 23 19:08 UTC))
  • Operating system: Ubuntu 20.04, amd64

Description

I had initially reported this issue with v2.6.2 here (https://caddy.community/t/caddy-prematurely-closes-connections-to-reverse-proxies-when-using-http-1/17974), which still persists despite the bugfixes to reverse_proxy in 2.6.4, so I'm creating this Github issue.

I have the following Caddyfile that reverse proxies connections to a locally running HTTP/1 API server at port 2000. Whenever it receives a POST request, it sends a 200 OK with Transfer-Encoding: chunked, and streams its response bit by bit, in delays of 2-3 seconds.

my.web.site {
    handle_path /api/* {
        reverse_proxy http://127.0.0.1:2000
    }
}

When I use HTTP/1 to make requests, I’ll sometimes get a truncated response that only contains the first chunk produced by the API server, but not the rest. However, this issue does not occur on HTTP/2.

A script like the one below can be used to reproduce the issue:

expected_output = '...'
for i in {1..200}; do
    curl --http1.1 -sSi https://my.web.site/api/some/path -X POST --data '...' > output.txt
    if ! grep -Fq "$expected_output" output.txt; then
       echo "Received truncated output"
       break
    fi
done

Looking at a packet capture taken on lo on TCP port 2000, (sudo tcpdump -i lo -w my.pcap -s 0 'tcp and src or dst port 2000'), I see that the API server had just sent the first chunk of data. However, Caddy proceeded to acknowledge the received data, which is fine, but then it also proceeds to request a closure of the TCP connection. The FIN, ACK is reciprocated by the app server, and the connection is closed.

Screenshot 2023-03-12 at 12 44 54 AM

@mholt
Copy link
Member

mholt commented Mar 11, 2023

Hi, thanks for the issue. In order to reproduce this, can you please provide the backend code that produces the response that Caddy has trouble proxying?

@mholt mholt added the needs info 📭 Requires more information label Mar 11, 2023
@francislavoie
Copy link
Member

I think golang/go@457fd1d might eventually fix that when we have access to it. I don't have time to test it though.

@supriyo-biswas
Copy link
Author

You can find a minimal example here: https://github.com/supriyo-biswas/caddy-http1-proxy-issue

I run the app server on Ubuntu 20.04 as well, though I believe the issue should also be reproducible on other distributions. Please let me know if you have trouble with getting it up and running.

@AnharMiah
Copy link

AnharMiah commented Mar 15, 2023

I'm seeing a very similar behaviour, it was very confusing because the reported "content size" does NOT match the transferred. However when I "curl" the actual endpoint that's behind the proxy the correct file content is returned. This pointed towards Caddy as the source of the issue.

EDIT I'm seeing this on Caddy v2.3.0-alpine (docker image)

@francislavoie
Copy link
Member

francislavoie commented Mar 15, 2023

If someone wants to push this along, you could try building Caddy with Go's master branch (see https://go.dev/doc/install/source) and changing Caddy to use ResponseController and EnableFullDuplex(). I don't have time to play around with that right now, but that's my hunch.

The problem is that this feature will only land in Go 1.21, and we need to wait for Caddy's minimum version to be 1.21 to safely use this new functionality. That means at the earliest, it'll still be another year until we can do so (due to Go's releases being every 6 months).

@WeidiDeng
Copy link
Member

@AnharMiah Does your python code require a domain name? I tried it always return 404 when I curl 127.0.0.1:2000 directly.

@supriyo-biswas
Copy link
Author

@WeidiDeng I think you intended to mention me instead.

If so, please look at the curl command in script.sh within the repo to see how you can make an example request. If you’re running the server directly (as opposed to proxying through Caddy) you’ll need to use the /run/xyz instead of /api/run/xyz instead.

@WeidiDeng
Copy link
Member

How often does this bug appear? On my mbp 2022, I changed iteration to 3k and it is still going without problems (with caddy as reverse proxy, no ssl).
image

@supriyo-biswas
Copy link
Author

Roughly one in 50 requests, though it could be that this occurs only on Linux; I use it for both development and production.

Next steps for me:

  • Determine if the issue happens over HTTPS
  • Determine if the issue happens on distributions other than Ubuntu 20.04.

@supriyo-biswas
Copy link
Author

supriyo-biswas commented Mar 19, 2023

I updated the repository with instructions so that the repository can be run on any Linux distribution that has Docker and git installed, without having to deal with the mess that is Python packaging.

I was able to reproduce the issue on both HTTP and HTTPS with Debian 10 (5.10.0-21, x86_64) and CentOS 9 Stream (5.14.0-285, x86_64), so it seems to affect all Linux distributions.

How often it occurs seems to be based on the network latency, when testing over the loopback (localhost) interface I needed ~400-500 requests to see the disconnect, whereas when testing the server over the internet it could be reproduced in under 50 requests, and sometimes even with 2 requests!

When the connection is closed by Caddy, an error showing a bad attempt to read from a closed socket shows up in the logs:

2023/03/19 06:44:41.645 ERROR   http.handlers.reverse_proxy     reading from backend    {"error": "read tcp 127.0.0.1:49286->127.0.0.1:2000: use of closed network connection"}
2023/03/19 06:44:41.646 ERROR   http.handlers.reverse_proxy     aborting with incomplete response       {"error": "read tcp 127.0.0.1:49286->127.0.0.1:2000: use of closed network connection"}

@WeidiDeng
Copy link
Member

I'm guessing this is a issue with stdlib, but I'll do some digging to determine which part is wrong.

@supriyo-biswas
Copy link
Author

In a rather confusing twist, the issue can't be reproduced on CentOS 8 Stream (4.18.0-448, x86_64) for upto ~3k requests, so I wonder if there's a kernel bug that is surfaced only by how Go's stdlib (or perhaps Caddy) makes network syscalls.

In any case, this is a very strange issue and I'm out of ideas about how to dig deep into it.

@AnharMiah
Copy link

AnharMiah commented Mar 20, 2023

Update:

Found a terrible work around 🤣 :

I used an nginx as a side car, so it appears nginx is able to handle proxing it and then Caddy sits in front of nginx. It's terrible but works. This means Nginx is more forgiving/graceful of this connection (I also tried with Apache and can confirm Apache is also able to handle this type of connection).

Also it if helps it only seems to happen on a very specific sized file which is about 140kb, smaller files transfer fine.

But at 140kb the file gets cut to about 109kb.

The back end is Rust Canteen.

This leads me to conclude that both Nginx and Apache handle this more gracefully then Caddy, so I would say Caddy is in my view the issue here.

Edit: I'm not entirely sure if my issue is related to this issue, as in my case it happens consistently every single time, that is that single 140kb JavaScript file is truncted without fail (but Nginx and Apache do not)

@francislavoie
Copy link
Member

@AnharMiah it would be helpful if you could try to assemble a minimally reproducible example using that stack. Ideally in Docker if you can, so we don't need to install the toolchains on our machines.

@AnharMiah
Copy link

@francislavoie I'll see what I can do when I get some time this weekend, and yes I'll containerise the entire thing. The environment is Linux, but I'll include all the relevant version numbers 👍

@supriyo-biswas
Copy link
Author

Is there any way I can help move forward this issue? I'm more than happy to help, its just that I'm out of ideas as to how to proceed, especially given my observation about the kernel versions being a factor.

@francislavoie
Copy link
Member

@supriyo-biswas See my comment above: #5431 (comment)

@francislavoie
Copy link
Member

francislavoie commented Apr 4, 2023

Last night, I decided to see how far I could get playing with http.ResponseController which is a new API in stdlib which has some nice functionality benefits. They're using it to add some new options to HTTP middleware. Practically, it'll mean that we'll be able to add a bunch of new options to https://caddyserver.com/docs/caddyfile/directives/request_body (probably) to set certain things like per-request read/write deadlines (timeouts), and also to call EnableFullDuplex() as per golang/go@457fd1d which I think is meant to solve this issue.

My progress on that can be found in the https://github.com/caddyserver/caddy/compare/response-controller branch (the first commit could land in when we bump Caddy's minimum to Go 1.20, but the EnableFullDuplex() part would have to wait until Go 1.21).

Problem is, it's impossible to build this branch right now on the latest unreleased Go version (i.e. gotip, see https://pkg.go.dev/golang.org/dl/gotip) because of our dependency on quic-go (which provides HTTP/3 support). The quic-go lib has to do some funky stuff with the crypto/tls APIs (i.e. making a full copy of the stdlib code and making modifications to it to make it QUIC compatible) and this means that there needs to exist per-Go-version copies of qtls (which that modified copy of stdlib crypto/tls). There doesn't exist one yet for Go 1.21 (i.e. the next Go version), so building Caddy on that version fails.

See quic-go/quic-go#3757 where I asked for guidance about building against gotip, there's no solution in the short term, but what I've found out is that Go is finally adding QUIC APIs to crypto/tls, hopefully landing before Go 1.21 is released (maybe in the next month or two?) so that's exciting 🎉

But that does mean there's no way to push this issue forwards at all, we're hard-blocked on waiting for those things to get resolved upstream before we can test EnableFullDuplex(). But also a reminder as I said earlier, since these are new APIs, we'd have to wait to merge any changes using these APIs until Caddy's minimum Go version allows for it (or we could add a Go version condition around the relevant code, TBD). See #5431 (comment).

@francislavoie
Copy link
Member

francislavoie commented Jul 26, 2023

If anyone here would like to try out #5654, this should have the fix.

You'll need to build with Go 1.21. You can install the latest commit of Go with https://pkg.go.dev/golang.org/dl/gotip, or use a Go 1.21 release candidate build from https://go.dev/dl/#go1.21rc3.

Checkout the Caddy repo on that branch, and run cd cmd/caddy && go build. You'll have a caddy binary produced in your current directory. Then in your Caddyfile global options, add this:

{
	servers {
		enable_full_duplex
	}
}

@mholt mholt added this to the v2.7.1 milestone Jul 26, 2023
@mholt mholt modified the milestones: v2.7.1, v2.7.0 Aug 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs info 📭 Requires more information upstream ⬆️ Relates to some dependency of this project
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants