Error: IncompleteMessage: connection closed before message completed #2136

fhsgoncalves · 2020-02-20T19:46:57Z

Hey, I'm experiencing a weird behavior with the hyper client when using https.
Sometimes my app in production fails to perform the request, but the same request works most of the time. I performed a load test locally to try to reproduce the problem, and I could reproduce: it is occurring ~0.02% of the times.

I guess that it could be something related to the hyper-tls, so I switched to hyper-rustls, but the same problem continue to occur.
So I tried to hit the url using http instead of https and the error went away!

The error I receive from hyper::Client::get is: hyper::Error(IncompleteMessage): connection closed before message completed.

Follow a minimal working example to reproduce the error:

Cargo.toml:

[dependencies]
hyper = "0.13"
tokio = { version = "0.2", features = ["full"] }
hyper-tls = "0.4.1"

src/main.rs:

use std::convert::Infallible;
use std::net::SocketAddr;

use hyper::service::{make_service_fn, service_fn};
use hyper::{Body, Client, Response, Server, Uri};
use hyper_tls::HttpsConnector;


pub type HttpClient = Client<HttpsConnector<hyper::client::connect::HttpConnector>>;

#[tokio::main]
async fn main() {
    let addr = SocketAddr::from(([0, 0, 0, 0], 8100));
    let client = Client::builder().build::<_, hyper::Body>(HttpsConnector::new());

    let make_service = make_service_fn(move |_| {
        let client = client.clone();
        async move { Ok::<_, Infallible>(service_fn(move |_req| handle(client.clone()) )) }
    });

    let server = Server::bind(&addr).serve(make_service);

    println!("Listening on http://{}", addr);

    if let Err(e) = server.await {
        eprintln!("server error: {}", e);
    }
}

async fn handle(client: HttpClient) -> Result<Response<Body>, hyper::Error> {

    let url = "https://url-here"; // CHANGE THE URL HERE!

    match client.get(url.parse::<Uri>().unwrap()).await {
        Ok(resp) => Ok(resp),
        Err(err) => { eprintln!("{:?} {}", err, err); Err(err) }
    }
}

PS: replace the url value with a valid https url. In my tests I used a small file on aws s3.

I performed a local load test using hey:

$ hey -z 120s -c 150 http://localhost:8100

Running the test for 2 minutes (-z 120s) was enough to see some errors appearing.

Could anyone help me out? If I need to provide more information, or anything, just let me know.
Thank you!

The text was updated successfully, but these errors were encountered:

seanmonstar · 2020-02-20T21:55:19Z

This is just due to the racy nature of networking.

hyper has a connection pool of idle connections, and it selected one to send your request. Most of the time, hyper will receive the server's FIN and drop the dead connection from its pool. But occasionally, a connection will be selected from the pool and written to at the same time the server is deciding to close the connection. Since hyper already wrote some of the request, it can't really retry it automatically on a new connection, since the server may have acted already.

fhsgoncalves · 2020-02-21T01:22:34Z

Hey, thank you for the swift response!

I got it! So the connection is being reused, right? It is due the keep-alive option?
If it is, disabling this flag, or performing a retry on the app side should solve the issue?

Also, I could not reproduce the error when requesting a url over http. I tried a lot of times, without success, I could only reproduce the issue when requesting a url over https.

If that is the reason, I should experienced the issue when using http too, right?

fhsgoncalves · 2020-02-21T04:07:55Z

I just found that aws s3 has a default max idle timeout of 20s, and hyper's default keep_alive_timeout is 90s.

Settings the keep_alive_timeout to less than 20s on hyper client seems to have solved the problem!

Thank you, your explanation really help me to understand why this was happening!

fhsgoncalves · 2020-02-21T12:26:20Z

I was looking at the java aws client, and I saw that they use the max-idle-timeout as 60s, but there is a second property called validate-after-inactivity (5s default) that allows the idle timeout be so high.
Looking at the code, I saw that the http client they use supports this behavior.

It would be possible to implement the same behavior on hyper? Does it make sense? 😄

seanmonstar · 2020-02-21T15:20:12Z

I believe the "revalidation" it does is to poll that it is readable. In hyper, we already register for when the OS discovers the connection has hung up. The race would still exist, if the "revalidation" happened at the same time the server was closing.

``` Error: Request Error when talking to qbittorrent: error sending request for url (http://localhost:6006/api/v2/torrents/delete): connection closed before message completed Caused by: 0: error sending request for url (http://localhost:6006/api/v2/torrents/delete): connection closed before message completed 1: connection closed before message completed ``` Issue: hyperium/hyper#2136

ronanyeah · 2021-06-15T20:57:46Z

Anyone getting this with reqwest, try this:

let client = reqwest::Client::builder()
    .pool_max_idle_per_host(0)
    .build()?;

wyyerd/stripe-rs#172

Rudo2204 · 2021-06-16T12:04:12Z

Well, I tried to use .pool_max_idle_per_host(0) and I still got this error today.

loyd · 2021-10-28T07:45:59Z

Doesn't hyper take into account the Keep-Alive header?

I've faced this problem with ClickHouse HTTP crate, however, ClickHouse sends Keep-Alive: timeout=3, so I don't understand, why hyper doesn't handle it.

@seanmonstar, any ideas?

Setting pool idle timeout to a value smaller than watchtower's poll interval can fix following error: [2022-08-25T04:03:22.811160892Z INFO solana_watchtower] Failure 1 of 3: solana-watchtower testnet: Error: rpc-error: error sending request for url (https://api.testnet.solana.com/): connection closed before message completed It looks like this happens because either RPC servers or ISPs drop HTTP connections without properly notifying the client in some cases. Similar issue: hyperium/hyper#2136.

Setting pool idle timeout to a value smaller than solana-watchtower's poll interval can fix following error: [2022-08-25T04:03:22.811160892Z INFO solana_watchtower] Failure 1 of 3: solana-watchtower testnet: Error: rpc-error: error sending request for url (https://api.testnet.solana.com/): connection closed before message completed It looks like this happens because either RPC servers or ISPs drop HTTP connections without properly notifying the client in some cases. Similar issue: hyperium/hyper#2136.

Setting pool idle timeout to a value smaller than solana-watchtower's poll interval can fix following error: [2022-08-25T04:03:22.811160892Z INFO solana_watchtower] Failure 1 of 3: solana-watchtower testnet: Error: rpc-error: error sending request for url (https://api.testnet.solana.com/): connection closed before message completed It looks like this happens because either RPC servers or ISPs drop HTTP connections without properly notifying the client in some cases. Similar issue: hyperium/hyper#2136. (cherry picked from commit 798975f)

### Description Applies the fix in #2384 everywhere an `HttpClient` is constructed via rusoto. It lowers the S3 timeout to 15s based on tips in [this thread](hyperium/hyper#2136 (comment)), to avoid `Error during dispatch: connection closed before message completed` errors. Note that we'll probably still run into these issues, but less frequently ([source](rusoto/rusoto#1766 (comment))). ### Drive-by changes  ### Related issues  ### Backward compatibility  ### Testing

Applies the fix in #2384 everywhere an `HttpClient` is constructed via rusoto. It lowers the S3 timeout to 15s based on tips in [this thread](hyperium/hyper#2136 (comment)), to avoid `Error during dispatch: connection closed before message completed` errors. Note that we'll probably still run into these issues, but less frequently ([source](rusoto/rusoto#1766 (comment))).

Backport of #3283 Applies the fix in #2384 everywhere an `HttpClient` is constructed via rusoto. It lowers the S3 timeout to 15s based on tips in [this thread](hyperium/hyper#2136 (comment)), to avoid `Error during dispatch: connection closed before message completed` errors. Note that we'll probably still run into these issues, but less frequently ([source](rusoto/rusoto#1766 (comment))).

we intermittently receive reports that we generate a downtime issue with the decription being `IncompleteMessageError`, and our customers say that their website was in fact up, not down. According to hyperium/hyper#2136, this could be due to connection pooling / a race condition between the client and server. one of the comments suggests setting the connection pool timeout to `0`. hyperium/hyper#2136 (comment) we do not need connection pooling as we aren't making that many requests to one particular host, so there should be no effect, and we can monitor if this fixes the error. --------- Co-authored-by: getsantry[bot] <66042841+getsantry[bot]@users.noreply.github.com>

### Description Applies the fix in hyperlane-xyz/hyperlane-monorepo#2384 everywhere an `HttpClient` is constructed via rusoto. It lowers the S3 timeout to 15s based on tips in [this thread](hyperium/hyper#2136 (comment)), to avoid `Error during dispatch: connection closed before message completed` errors. Note that we'll probably still run into these issues, but less frequently ([source](rusoto/rusoto#1766 (comment))). ### Drive-by changes  ### Related issues  ### Backward compatibility  ### Testing

…ft.io (#164) We're seeing frequent errors from the deployed service when it is making requests to snapcraft.io that look like the following: ``` error sending request for url (...) Caused by: client error (SendRequest) Caused by: connection closed before message completed ``` hyperium/hyper#2136 has details on the underlying source of the error and it looks like we need to reduce the default `pool_idle_timeout` from 90s to something much shorter. Initial local testing looks like we want to drop it all the way down to 5s which is what this PR is doing, but we probably want to tune this as we determine the performance implications.

fhsgoncalves closed this as completed Feb 21, 2020

Mark-Simulacrum mentioned this issue Feb 27, 2020

Error sending request: connection closed before message completed rust-lang/triagebot#363

Open

lucdew mentioned this issue Jun 5, 2020

Error during dispatch: connection closed before message completed rusoto/rusoto#1766

Closed

mbelang mentioned this issue Aug 14, 2020

Intermittent 502 Bad Gateway issue when service is meshed linkerd/linkerd2#4870

Closed

epi052 mentioned this issue Nov 17, 2020

[BUG] Error while making request, no file descriptors available, and question about -t option. epi052/feroxbuster#131

Closed

MOZGIII mentioned this issue Nov 27, 2020

Do not crash on unhandled error at the adaptive concurrency controller vectordotdev/vector#5259

Closed

Morganamilo mentioned this issue Jan 17, 2021

"Connection closed before message completed" error. Morganamilo/paru#167

Closed

buxx mentioned this issue Mar 5, 2021

Random crash on requests buxx/rolling#69

Closed

stearnsc mentioned this issue Apr 17, 2021

Requests fail intermittently w/ hyper error wyyerd/stripe-rs#173

Closed

stearnsc mentioned this issue Apr 28, 2021

Support connection pooling w/ retries using idempotency keys wyyerd/stripe-rs#176

Open

DevBlocky mentioned this issue May 30, 2021

connection closed before message completed DevBlocky/scalpel#30

Closed

mosyp mentioned this issue Oct 11, 2021

Generate new session name on assume role credentials provider refresh delta-io/delta-rs#451

Merged

vladimir-dd mentioned this issue Oct 19, 2021

fix(aws_s3 sink): close idle connections for aws s3 sinks vectordotdev/vector#9703

Merged

ncloudioj mentioned this issue Dec 7, 2021

test: Fix intermittent failures with ReqwestClient mozilla-services/merino#262

Merged

loyd mentioned this issue Dec 10, 2021

Client should respect the "Keep-Alive" header #2720

Open

Gun9niR mentioned this issue Sep 20, 2022

S3 multipart upload fails occassionally risingwavelabs/risingwave#5382

Closed

8 tasks

XciD mentioned this issue Jan 31, 2024

reduce default (90s) keepalive to 15s huggingface/hf_transfer#25

Merged

daniel-savu mentioned this issue Feb 19, 2024

fix: lower rusoto timeout to 15s hyperlane-xyz/hyperlane-monorepo#3283

Merged

behrisch mentioned this issue Feb 28, 2024

recheck webserver setup and/or link checker configuration for uncompleted requests eclipse-sumo/sumo#14431

Closed

Christof23 mentioned this issue Mar 5, 2024

Add retry on 504 Gateway Time-out and "connection closed before message completed" 64bit/async-openai#198

Closed

joseluisq mentioned this issue Mar 6, 2024

Dragging the video progress bar causes screen lag only in ios safari static-web-server/static-web-server#320

Closed

5 tasks

dgarcia-collegeboard mentioned this issue Mar 20, 2024

Call to KDS 'put_records' fails intermittently with 'Connection reset by Peer' within lambda extension awslabs/aws-sdk-rust#1106

Open

stefansundin mentioned this issue May 19, 2024

Make examples/simple.rs compatible with hyper v1 stefansundin/hyper-reverse-proxy#1

Merged

daniel-savu mentioned this issue May 30, 2024

fix: backport rusoto timeout change to v2 hyperlane-xyz/hyperlane-monorepo#3872

Merged

janezicmatej mentioned this issue Jun 3, 2024

fix(generator): spawn tokio task for mt responses matijapretnar/programiranje-2#2

Merged

Narsil mentioned this issue Jun 4, 2024

More advanced pipelining for hf_transfer to increase download speed even further huggingface/hf_transfer#32

Open

NathanFlurry mentioned this issue Jul 10, 2024

Lower Hyper pool keepalive to mitigate IncompleteMessage error rivet-gg/toolchain#265

Closed

DavisVaughan mentioned this issue Jul 23, 2024

Help proxy for R errors posit-dev/positron#3753

Closed

Manuthor mentioned this issue Jul 25, 2024

fix: mitigate hyper error: IncompleteMessage: connection closed before message completed Cosmian/kms#285

Merged

Manuthor mentioned this issue Aug 13, 2024

fix: validate certificate generation Cosmian/kms#283

Merged

powellnorma mentioned this issue Oct 22, 2024

x11: send end of previous active window 2e3s/awatcher#31

Open

r-ml mentioned this issue Oct 28, 2024

Connection closed before message completed durch/rust-s3#405

Closed

This was referenced Nov 21, 2024

Bug: spark aws tests are flaky lakekeeper/lakekeeper#544

Closed

fix: flaky aws tests lakekeeper/lakekeeper#545

Merged

Retrying FileIOs lakekeeper/lakekeeper#567

Closed

meowjesty mentioned this issue Nov 26, 2024

Retry http request (intproxy) on hyper IncompleteMessage error. metalbear-co/mirrord#2934

Merged

JoshFerge mentioned this issue Jan 22, 2025

fix: make max_idle_per_host 0 getsentry/uptime-checker#204

Merged

ltitanb mentioned this issue Jan 28, 2025

Connection interruption Commit-Boost/commit-boost-client#247

Open

sminez mentioned this issue Feb 26, 2025

fix: attempting to set a pool idle timeout for connections to snapcraft.io ubuntu/app-center-ratings#164

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error: IncompleteMessage: connection closed before message completed #2136

Error: IncompleteMessage: connection closed before message completed #2136

fhsgoncalves commented Feb 20, 2020 •

edited

Loading

seanmonstar commented Feb 20, 2020

fhsgoncalves commented Feb 21, 2020

fhsgoncalves commented Feb 21, 2020

fhsgoncalves commented Feb 21, 2020 •

edited

Loading

seanmonstar commented Feb 21, 2020

ronanyeah commented Jun 15, 2021

Rudo2204 commented Jun 16, 2021

loyd commented Oct 28, 2021

Error: IncompleteMessage: connection closed before message completed #2136

Error: IncompleteMessage: connection closed before message completed #2136

Comments

fhsgoncalves commented Feb 20, 2020 • edited Loading

seanmonstar commented Feb 20, 2020

fhsgoncalves commented Feb 21, 2020

fhsgoncalves commented Feb 21, 2020

fhsgoncalves commented Feb 21, 2020 • edited Loading

seanmonstar commented Feb 21, 2020

ronanyeah commented Jun 15, 2021

Rudo2204 commented Jun 16, 2021

loyd commented Oct 28, 2021

fhsgoncalves commented Feb 20, 2020 •

edited

Loading

fhsgoncalves commented Feb 21, 2020 •

edited

Loading