Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Phil/http client errors #369

Merged
merged 2 commits into from
Mar 15, 2024
Merged

Phil/http client errors #369

merged 2 commits into from
Mar 15, 2024

Conversation

psFried
Copy link
Contributor

@psFried psFried commented Mar 15, 2024

This addresses an issue we've encountered in production that results in a large
number of request failures. This happens when one of the TCP connections is to
a server that starts sending RSTStream frames with an INTERNAL_ERROR status
(http2 stream status, not http request status). In this case, the Go http
client will continue to re-use the connection for other requests, which also
fail in the same way. What we'd really like is to have some sort of logic that
says, "if connection x has returned such an internal error, then stop using
that connection for future requests and shut it down". Unfortunately, such
logic is impossible to express with the current Transport and
ClientConnPool APIs. So the next best thing is a workaround where we
re-create the entire http client, to force it to re-establish all connections.
That's what this does. Plus a little extra for dealing with the dynamic
enablement/disablement of the file protocol transport.

The previous attempt to work around this issue was to force the client to use http1.1 only. It's unclear whether that would have worked, or if it would have experienced the same issue just with http 500 responses. In any case, this reverts that commit in favor of re-creating the client only when failures are detected.

I tested this by running gazctl journals read and verifying via the debug logs that fragments were being fetched directly.

This change is Reviewable

…rom cloud storage"

This reverts commit 81d0aa5.
The intent of that commit was to work around an issue where a bad connection
could be reused for subsequent requests. But I now think that this brings too
great a risk for performance degradation due to the much lower effective
connection limits. And I also suspect it may not even solve the problem, since
http1.1 connections are still re-used in many cases. So the plan is to try
something different in the next commit.
@psFried psFried force-pushed the phil/http-client-errors branch 2 times, most recently from b967eb0 to 9fc7b41 Compare March 15, 2024 17:54
This addresses an issue we've encountered in production that results in a large
number of request failures.  This happens when one of the TCP connections is to
a server that starts sending RSTStream frames with an `INTERNAL_ERROR` status
(http2 stream status, not http request status). In this case, the Go http
client will continue to re-use the connection for other requests, which also
fail in the same way. What we'd really like is to have some sort of logic that
says, "if connection x has returned such an internal error, then stop using
that connection for future requests and shut it down". Unfortunately, such
logic is impossible to express with the current `Transport` and
`ClientConnPool` APIs. So the next best thing is a workaround where we
re-create the entire http client, to force it to re-establish all connections.
That's what this does. Plus a little extra for dealing with the dynamic
enablement/disablement of the file protocol transport.
@psFried psFried force-pushed the phil/http-client-errors branch from 9fc7b41 to e9352e8 Compare March 15, 2024 18:31
@psFried psFried marked this pull request as ready for review March 15, 2024 18:39
@psFried psFried requested a review from jgraettinger March 15, 2024 18:39
Copy link
Contributor

@jgraettinger jgraettinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@psFried psFried merged commit ab11240 into master Mar 15, 2024
1 check passed
@psFried psFried deleted the phil/http-client-errors branch March 15, 2024 18:54
psFried added a commit to estuary/flow that referenced this pull request Mar 15, 2024
psFried added a commit to estuary/flow that referenced this pull request Mar 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants