Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: IncompleteMessage: connection closed before message completed #2136

Closed
fhsgoncalves opened this issue Feb 20, 2020 · 10 comments
Closed

Comments

@fhsgoncalves
Copy link

fhsgoncalves commented Feb 20, 2020

Hey, I'm experiencing a weird behavior with the hyper client when using https.
Sometimes my app in production fails to perform the request, but the same request works most of the time. I performed a load test locally to try to reproduce the problem, and I could reproduce: it is occurring ~0.02% of the times.

I guess that it could be something related to the hyper-tls, so I switched to hyper-rustls, but the same problem continue to occur.
So I tried to hit the url using http instead of https and the error went away!

The error I receive from hyper::Client::get is: hyper::Error(IncompleteMessage): connection closed before message completed.

Follow a minimal working example to reproduce the error:

Cargo.toml:

[dependencies]
hyper = "0.13"
tokio = { version = "0.2", features = ["full"] }
hyper-tls = "0.4.1"

src/main.rs:

use std::convert::Infallible;
use std::net::SocketAddr;

use hyper::service::{make_service_fn, service_fn};
use hyper::{Body, Client, Response, Server, Uri};
use hyper_tls::HttpsConnector;


pub type HttpClient = Client<HttpsConnector<hyper::client::connect::HttpConnector>>;

#[tokio::main]
async fn main() {
    let addr = SocketAddr::from(([0, 0, 0, 0], 8100));
    let client = Client::builder().build::<_, hyper::Body>(HttpsConnector::new());

    let make_service = make_service_fn(move |_| {
        let client = client.clone();
        async move { Ok::<_, Infallible>(service_fn(move |_req| handle(client.clone()) )) }
    });

    let server = Server::bind(&addr).serve(make_service);

    println!("Listening on http://{}", addr);

    if let Err(e) = server.await {
        eprintln!("server error: {}", e);
    }
}

async fn handle(client: HttpClient) -> Result<Response<Body>, hyper::Error> {

    let url = "https://url-here"; // CHANGE THE URL HERE!

    match client.get(url.parse::<Uri>().unwrap()).await {
        Ok(resp) => Ok(resp),
        Err(err) => { eprintln!("{:?} {}", err, err); Err(err) }
    }
}

PS: replace the url value with a valid https url. In my tests I used a small file on aws s3.

I performed a local load test using hey:

$ hey -z 120s -c 150 http://localhost:8100 

Running the test for 2 minutes (-z 120s) was enough to see some errors appearing.

Could anyone help me out? If I need to provide more information, or anything, just let me know.
Thank you!

@seanmonstar
Copy link
Member

This is just due to the racy nature of networking.

hyper has a connection pool of idle connections, and it selected one to send your request. Most of the time, hyper will receive the server's FIN and drop the dead connection from its pool. But occasionally, a connection will be selected from the pool and written to at the same time the server is deciding to close the connection. Since hyper already wrote some of the request, it can't really retry it automatically on a new connection, since the server may have acted already.

@fhsgoncalves
Copy link
Author

Hey, thank you for the swift response!

I got it! So the connection is being reused, right? It is due the keep-alive option?
If it is, disabling this flag, or performing a retry on the app side should solve the issue?


Also, I could not reproduce the error when requesting a url over http. I tried a lot of times, without success, I could only reproduce the issue when requesting a url over https.

If that is the reason, I should experienced the issue when using http too, right?

@fhsgoncalves
Copy link
Author

I just found that aws s3 has a default max idle timeout of 20s, and hyper's default keep_alive_timeout is 90s.

Settings the keep_alive_timeout to less than 20s on hyper client seems to have solved the problem!

Thank you, your explanation really help me to understand why this was happening!

@fhsgoncalves
Copy link
Author

fhsgoncalves commented Feb 21, 2020

I was looking at the java aws client, and I saw that they use the max-idle-timeout as 60s, but there is a second property called validate-after-inactivity (5s default) that allows the idle timeout be so high.
Looking at the code, I saw that the http client they use supports this behavior.

It would be possible to implement the same behavior on hyper? Does it make sense? 😄

@seanmonstar
Copy link
Member

I believe the "revalidation" it does is to poll that it is readable. In hyper, we already register for when the OS discovers the connection has hung up. The race would still exist, if the "revalidation" happened at the same time the server was closing.

Rudo2204 added a commit to Rudo2204/rpl that referenced this issue Jun 9, 2021
```
Error: Request Error when talking to qbittorrent: error sending request for url (http://localhost:6006/api/v2/torrents/delete): connection closed before message completed

Caused by:
    0: error sending request for url (http://localhost:6006/api/v2/torrents/delete): connection closed before message completed
    1: connection closed before message completed
```
Issue: hyperium/hyper#2136
@ronanyeah
Copy link

Anyone getting this with reqwest, try this:

let client = reqwest::Client::builder()
    .pool_max_idle_per_host(0)
    .build()?;

wyyerd/stripe-rs#172

@Rudo2204
Copy link

Well, I tried to use .pool_max_idle_per_host(0) and I still got this error today.

@loyd
Copy link

loyd commented Oct 28, 2021

Doesn't hyper take into account the Keep-Alive header?

I've faced this problem with ClickHouse HTTP crate, however, ClickHouse sends Keep-Alive: timeout=3, so I don't understand, why hyper doesn't handle it.

@seanmonstar, any ideas?

im-0 added a commit to im-0/solana that referenced this issue Aug 26, 2022
Setting pool idle timeout to a value smaller than watchtower's poll
interval can fix following error:

	[2022-08-25T04:03:22.811160892Z INFO  solana_watchtower] Failure 1 of 3: solana-watchtower testnet: Error: rpc-error: error sending request for url (https://api.testnet.solana.com/): connection closed before message completed

It looks like this happens because either RPC servers or ISPs drop HTTP
connections without properly notifying the client in some cases.

Similar issue: hyperium/hyper#2136.
im-0 added a commit to im-0/solana that referenced this issue Sep 16, 2022
Setting pool idle timeout to a value smaller than solana-watchtower's
poll interval can fix following error:

	[2022-08-25T04:03:22.811160892Z INFO  solana_watchtower] Failure 1 of 3: solana-watchtower testnet: Error: rpc-error: error sending request for url (https://api.testnet.solana.com/): connection closed before message completed

It looks like this happens because either RPC servers or ISPs drop HTTP
connections without properly notifying the client in some cases.

Similar issue: hyperium/hyper#2136.
mvines pushed a commit to solana-labs/solana that referenced this issue Sep 16, 2022
Setting pool idle timeout to a value smaller than solana-watchtower's
poll interval can fix following error:

	[2022-08-25T04:03:22.811160892Z INFO  solana_watchtower] Failure 1 of 3: solana-watchtower testnet: Error: rpc-error: error sending request for url (https://api.testnet.solana.com/): connection closed before message completed

It looks like this happens because either RPC servers or ISPs drop HTTP
connections without properly notifying the client in some cases.

Similar issue: hyperium/hyper#2136.
mergify bot pushed a commit to solana-labs/solana that referenced this issue Sep 16, 2022
Setting pool idle timeout to a value smaller than solana-watchtower's
poll interval can fix following error:

	[2022-08-25T04:03:22.811160892Z INFO  solana_watchtower] Failure 1 of 3: solana-watchtower testnet: Error: rpc-error: error sending request for url (https://api.testnet.solana.com/): connection closed before message completed

It looks like this happens because either RPC servers or ISPs drop HTTP
connections without properly notifying the client in some cases.

Similar issue: hyperium/hyper#2136.

(cherry picked from commit 798975f)
mvines pushed a commit to solana-labs/solana that referenced this issue Sep 17, 2022
Setting pool idle timeout to a value smaller than solana-watchtower's
poll interval can fix following error:

	[2022-08-25T04:03:22.811160892Z INFO  solana_watchtower] Failure 1 of 3: solana-watchtower testnet: Error: rpc-error: error sending request for url (https://api.testnet.solana.com/): connection closed before message completed

It looks like this happens because either RPC servers or ISPs drop HTTP
connections without properly notifying the client in some cases.

Similar issue: hyperium/hyper#2136.

(cherry picked from commit 798975f)
daniel-savu added a commit to hyperlane-xyz/hyperlane-monorepo that referenced this issue Feb 19, 2024
### Description

Applies the fix in
#2384 everywhere
an `HttpClient` is constructed via rusoto.

It lowers the S3 timeout to 15s based on tips in [this
thread](hyperium/hyper#2136 (comment)),
to avoid `Error during dispatch: connection closed before message
completed` errors. Note that we'll probably still run into these issues,
but less frequently
([source](rusoto/rusoto#1766 (comment))).


### Drive-by changes

<!--
Are there any minor or drive-by changes also included?
-->

### Related issues

<!--
- Fixes #[issue number here]
-->

### Backward compatibility

<!--
Are these changes backward compatible? Are there any infrastructure
implications, e.g. changes that would prohibit deploying older commits
using this infra tooling?

Yes/No
-->

### Testing

<!--
What kind of testing have these changes undergone?

None/Manual/Unit Tests
-->
daniel-savu added a commit to hyperlane-xyz/hyperlane-monorepo that referenced this issue May 30, 2024
Applies the fix in
#2384 everywhere
an `HttpClient` is constructed via rusoto.

It lowers the S3 timeout to 15s based on tips in [this
thread](hyperium/hyper#2136 (comment)),
to avoid `Error during dispatch: connection closed before message
completed` errors. Note that we'll probably still run into these issues,
but less frequently
([source](rusoto/rusoto#1766 (comment))).

<!--
Are there any minor or drive-by changes also included?
-->

<!--
- Fixes #[issue number here]
-->

<!--
Are these changes backward compatible? Are there any infrastructure
implications, e.g. changes that would prohibit deploying older commits
using this infra tooling?

Yes/No
-->

<!--
What kind of testing have these changes undergone?

None/Manual/Unit Tests
-->
daniel-savu added a commit to hyperlane-xyz/hyperlane-monorepo that referenced this issue Jun 4, 2024
Backport of
#3283

Applies the fix in
#2384 everywhere
an `HttpClient` is constructed via rusoto.

It lowers the S3 timeout to 15s based on tips in [this
thread](hyperium/hyper#2136 (comment)),
to avoid `Error during dispatch: connection closed before message
completed` errors. Note that we'll probably still run into these issues,
but less frequently

([source](rusoto/rusoto#1766 (comment))).
JoshFerge added a commit to getsentry/uptime-checker that referenced this issue Jan 22, 2025
we intermittently receive reports that we generate a downtime issue with
the decription being `IncompleteMessageError`, and our customers say
that their website was in fact up, not down. According to
hyperium/hyper#2136, this could be due to
connection pooling / a race condition between the client and server. one
of the comments suggests setting the connection pool timeout to `0`.
hyperium/hyper#2136 (comment)

we do not need connection pooling as we aren't making that many requests
to one particular host, so there should be no effect, and we can monitor
if this fixes the error.

---------

Co-authored-by: getsantry[bot] <66042841+getsantry[bot]@users.noreply.github.com>
fmorency pushed a commit to fmorency/hyperlane-agents that referenced this issue Feb 20, 2025
### Description

Applies the fix in
hyperlane-xyz/hyperlane-monorepo#2384 everywhere
an `HttpClient` is constructed via rusoto.

It lowers the S3 timeout to 15s based on tips in [this
thread](hyperium/hyper#2136 (comment)),
to avoid `Error during dispatch: connection closed before message
completed` errors. Note that we'll probably still run into these issues,
but less frequently
([source](rusoto/rusoto#1766 (comment))).


### Drive-by changes

<!--
Are there any minor or drive-by changes also included?
-->

### Related issues

<!--
- Fixes #[issue number here]
-->

### Backward compatibility

<!--
Are these changes backward compatible? Are there any infrastructure
implications, e.g. changes that would prohibit deploying older commits
using this infra tooling?

Yes/No
-->

### Testing

<!--
What kind of testing have these changes undergone?

None/Manual/Unit Tests
-->
matthew-hagemann added a commit to ubuntu/app-center-ratings that referenced this issue Feb 26, 2025
…ft.io (#164)

We're seeing frequent errors from the deployed service when it is making
requests to snapcraft.io that look like the following:
```
error sending request for url (...)
Caused by: client error (SendRequest)
Caused by: connection closed before message completed
```

hyperium/hyper#2136 has details on the
underlying source of the error and it looks like we need to reduce the
default `pool_idle_timeout` from 90s to something much shorter. Initial
local testing looks like we want to drop it all the way down to 5s which
is what this PR is doing, but we probably want to tune this as we
determine the performance implications.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants