-
Notifications
You must be signed in to change notification settings - Fork 996
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(firehose): Set connection window size to the maximum #3818
Conversation
d77f190
to
381781c
Compare
graph/src/firehose/endpoints.rs
Outdated
.connect_timeout(Duration::from_secs(10)); | ||
.connect_timeout(Duration::from_secs(10)) | ||
.http2_keep_alive_interval(Duration::from_secs(30)) | ||
.http2_adaptive_window(true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This causes disconnections when connecting to firehose through a GCP Load Balancer. (error reading a body from connection: unexpected end of file
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for testing that! I've removed this, and took a different approach of setting a very large window size.
graph/src/firehose/endpoints.rs
Outdated
@@ -56,7 +56,9 @@ impl FirehoseEndpoint { | |||
.expect("TLS config on this host is invalid"), | |||
_ => panic!("invalid uri scheme for firehose endpoint"), | |||
} | |||
.connect_timeout(Duration::from_secs(10)); | |||
.connect_timeout(Duration::from_secs(10)) | |||
.http2_keep_alive_interval(Duration::from_secs(30)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is too low IMHO. While I didn't experience disconnections with this setting, some backends are very restrictive on this (at 10secs, it DOES generate similar disconnections as with http2_adaptive_window)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed the keep alive for now, to focus this PR on the stalling bug.
This now sets the connection window to the maximum value, effectively disabling it. I was able to locally reproduce that this fixes the block stream stalling issue. I'm also convinced this is not a hyper bug but is http2 flow control working as designed, for our use case we should opt out of the connection level flow control. |
We currently multiplex all firehose connections on a single http2 connection, and under load we're seeing possible flow control issues, possibly a hyper bug. The default per-connection http2 window is small and equal to the per-stream window.
Enabling adaptive windows should give us a much bigger connection window, and possibly solve the issue at the current loads. This sets the connection window to the maximum value, effectively disabling it.