Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTP2 multiplexed requests get stuck in presence of "slow readers", Stream/Connection window is not being updated #2899

Open
dmitryvk opened this issue Jun 20, 2022 · 7 comments
Labels
C-bug Category: bug. Something is wrong. This is bad!

Comments

@dmitryvk
Copy link

Version
hyper 0.14.19
h2 0.3.13

Platform
Fedora 36

Linux fedora-wb-dev 5.17.11-300.fc36.x86_64 #1 SMP PREEMPT Wed May 25 15:04:05 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Description
We use hyper as a server and as a client in distributed application. Hyper client is configured as .http2_only(true).
Some GET requests are getting stuck. Those requests stream large response body (tens of megabytes). Some part of response body is transferred (several megabytes) and then both server and client are stuck with zero bytes being transferred. This only happens with HTTP2 in use.

The following code always reproduces this on my machine:

# Cargo.toml
[package]
name = "hyperreader"
version = "0.1.0"
edition = "2021"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]
hyper = { version = "0.14.13", features = ["full"] }
tokio = { version = "1.12.0", features = ["full"] }
tokio-stream="*"
// main.rs
use std::convert::Infallible;
use std::error::Error;
use std::mem::forget;
use std::net::SocketAddr;
use std::sync::Arc;
use std::time::Duration;
use hyper::{Body, Request, Response};
use hyper::client::HttpConnector;
use hyper::service::{make_service_fn, service_fn};
use tokio::spawn;
use tokio::time::sleep;
use tokio_stream::StreamExt;

const URL: &'static str = "http://127.0.0.1:1234";
const PORT: u16 = 1234;
const BODY_SIZE: usize = 10 * 1024 * 1024;

#[tokio::main]
async fn main() -> Result<(), Box<dyn Error>> {
    spawn(
        hyper::server::Server::bind(&SocketAddr::from(([127, 0, 0, 1], PORT)))
        .serve(make_service_fn(|_| async { Ok::<_, Infallible>(service_fn(serve_file)) }))
    );

    let client = hyper::client::Builder::default()
        .http2_only(true)
        .build::<_, Body>(HttpConnector::new());
    let client = Arc::new(client);

    for i in 0..3 {
        eprintln!("Sending {}", i);
        let uri = URL.parse().unwrap();
        let response = client.get(uri).await.unwrap();
        eprintln!("Received response {}", i);
        forget(response);
    }

    eprintln!("Sleeping 1 sec...");
    sleep(Duration::from_secs(1)).await;

    eprintln!("Sending request...");
    let uri = URL.parse().unwrap();
    let mut response = client.get(uri).await.unwrap();
    eprintln!("Received response...");
    let mut count = 0;
    while let Some(chunk) = response.body_mut().next().await {
        count += chunk?.len();
        println!("{}/{} bytes", count, BODY_SIZE);
    }

    Ok(())
}

async fn serve_file(_req: Request<Body>) -> Result<Response<Body>, Infallible> {
    let data = vec![0; BODY_SIZE];
    Ok(Response::new(Body::from(data)))
}

(note that the constants may be specific for a particular machine and may need tweaking for this to reproduce)

I expected to see this happen:
The program would run to completion and would print:

Sending 0
Received response 0
Sending 1
Received response 1
Sending 2
Received response 2
Sleeping 1 sec...
Sending request...
Received response...
// many lines skipped
10485760/10485760 bytes

Instead, this happened:

The program prints the following lines and then does nothing:

Sending 0
Received response 0
Sending 1
Received response 1
Sending 2
Received response 2
Sleeping 1 sec...
Sending request...
Received response...

In wireshark I see that neither client nor server don't send WINDOW_UPDATE frames during data transfer (except at the connection establishment). I suspect that this is a cause for this behavior.

@dmitryvk dmitryvk added the C-bug Category: bug. Something is wrong. This is bad! label Jun 20, 2022
@olix0r
Copy link
Contributor

olix0r commented Jun 20, 2022

I suggest using tracing to view the logs from the h2 crate -- it should log as capacity is released, which should be done here:

let _ = h2.flow_control().release_capacity(bytes.len());

@dmitryvk
Copy link
Author

Thanks for the suggestion.
I've added tracing-subscriber (at TRACE level) and got the following log output. It's intermixed with stdout output, so messages from println's are also there.
hyperreader.log

@dmitryvk
Copy link
Author

In the log I don't see any release_connection_capacity or release_capacity.

By the code it seems that the connection receive window is being increased only when some stream has received data and was polled - which I believe is not right since some streams may be polled quite late or not polled at all.
In my example there are several streams which (I assume) hog entire connection capacity (from client receive/server send side) which prevents server from sending data frames for new streams.

@leoyvens
Copy link
Contributor

leoyvens commented Aug 8, 2022

We hit this issue when multiplexing many grpc streams on a same channel with tonic. The fix for us was to set a very high connection window size, to try to effectively disable http2 connection flow control.

@dmitryvk I believe you are correct in that capacity is only released when the stream is polled. According to this post multiple gRPC implementations hit this issue and decided to release capacity more eagerly. However my opinion from reading the http2 rfc is that h2 and hyper are implementing flow control correctly. This is quite a footgun but it seems to be by design in http2.

If there is anything hyper can improve in its http2 client defaults to fix this, I believe it would be important to do so, as this bug was quite bad in severity and difficulty to debug.

@seanmonstar Sorry for the ping, but any insight here would be appreciated.

@seanmonstar
Copy link
Member

I remember reading that blog post, and feeling the same as you described: it sounds like they're effectively disabling connection-level flow control. That has other problems, as the post itself describes. I don't feel the post sufficiently explains how best to offset them, so I haven't wanted to make this default in hyper. But, a user can effectively do so by setting http2_connection_initial_window_size() to the max, if they wanted to opt-in to that and deal with the dangers themselves.

@dmitryvk
Copy link
Author

dmitryvk commented Aug 8, 2022

I had an idea (that I had not yet tried to implement): release connection capacity when stream is accepted or polled for the first time. This still provides some sort of backpressure and seems to avoid the issue of slow readers hogging connection window capacity.

@seanmonstar
Copy link
Member

release connection capacity when stream is accepted or polled for the first time.

I'm curious to hear more. Are you saying that when the user is first giving the http::Request, automatically "release" any connection window that had been receive so far on that stream? What consequences would that have? Is there danger? An improvement?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: bug. Something is wrong. This is bad!
Projects
None yet
Development

No branches or pull requests

4 participants