-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow multiplexed upstream servers to half close the stream before the downstream #34461
Allow multiplexed upstream servers to half close the stream before the downstream #34461
Conversation
…e downstream. Signed-off-by: Yan Avlasov <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great work! Complicated PR so a bunch of comments follow
/wait
// being interpreted as a failure, when in fact no error has necessarily occurred. Gracefully | ||
// closing seems most in-line with behavior elsewhere in Envoy for now. | ||
// At present, the AsyncStream is always fully closed when the server half closes the stream. | ||
// This is the case even when allow_multiplexed_upstream_half_close runtime flag is set, as there |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: remove flag name so we don't have to remember to update the comment when we remove the flag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -283,7 +283,9 @@ class ConnectionManagerImpl : Logger::Loggable<Logger::Id::http>, | |||
OptRef<const Tracing::Config> tracingConfig() const override; | |||
const ScopeTrackedObject& scope() override; | |||
OptRef<DownstreamStreamFilterCallbacks> downstreamCallbacks() override { return *this; } | |||
bool isHalfCloseEnabled() override { return false; } | |||
bool isHalfCloseEnabled() override { | |||
return filter_manager_.allowUpstreamHalfClose() ? true : false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return filter_manager_.allowUpstreamHalfClose()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
source/common/http/filter_manager.h
Outdated
bool stream_closed_ : 1; // Set when both remote_decode_complete_ and remote_encode_complete_ is | ||
// true observed for the first time and prevents ending the stream | ||
// multiple times. Only set when allow_upstream_half_close is enabled. | ||
bool force_close_stream_ : 1; // Set to indicate that stream should be closed due to either |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we stick with half close naming for consistency? support_half_close?
if not consider renaming to something like should_force_close
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed to should_force_close_stream_
source/common/http/filter_manager.cc
Outdated
ENVOY_STREAM_LOG(trace, | ||
"commonEncodePrefix end_stream: {}, isHalfCloseEnabled: {}, force_close: {}", | ||
*this, end_stream, filter_manager_callbacks_.isHalfCloseEnabled(), | ||
static_cast<bool>(state_.force_close_stream_)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add to dumpstate instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a good understanding of dumpstate but just quickly read #7300. if the dumpstate need to be invoked by human or at crash automatically, I hope we keep this log for stream level debugging (while agreed that this information should be added on the other place too, if needed). It will be super helpful to users who debug streams with trace log, why their stream got reset or why not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not both? Added to both places.
source/common/http/filter_manager.cc
Outdated
state_.local_complete_ = true; | ||
if (allow_upstream_half_close_) { | ||
if (end_stream) { | ||
state_.encoder_end_stream_ = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how is this different from remote_decode_complete? it's set when we see end stream. I think it's just badly named.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we explicitly comment how this differs from local_complete as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a bit messy. I have added more comments to local_complete
and encoder_end_stream
and left a TODO to always use encoder_end_stream
for tracking end_stream state in encoder filer iteration.
I will do this refactor when the runtime flag is removed.
source/common/http/filter_manager.cc
Outdated
@@ -1489,6 +1511,27 @@ void FilterManager::maybeEndEncode(bool end_stream) { | |||
if (end_stream) { | |||
ASSERT(!state_.remote_encode_complete_); | |||
state_.remote_encode_complete_ = true; | |||
if (allow_upstream_half_close_) { | |||
maybeCloseStream(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be half close? closeUpstream?
should this logic all be in maybeClose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It closes downstream and upstream streams. I have renamed it checkAndCloseStreamIfFullyClosed
to hopefully make it clearer.
source/common/http/filter_manager.cc
Outdated
// end stream is observed on the encode path in case upstream half closes before the | ||
// downstream. | ||
return parent_.allow_upstream_half_close_ ? parent_.state_.encoder_end_stream_ | ||
: parent_.state_.local_complete_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we may need a rename / comment PR before we land this because I think this would be more clear if local_complete were more clear in the first place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or after? (when we consider the urgency of this issue)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have updated comments for local_complete_ and encoder_end_stream_ flags
source/common/http/filter_manager.h
Outdated
@@ -748,6 +750,9 @@ class FilterManager : public ScopeTrackedObject, | |||
void decodeHeaders(RequestHeaderMap& headers, bool end_stream) { | |||
state_.remote_decode_complete_ = end_stream; | |||
decodeHeaders(nullptr, headers, end_stream); | |||
if (allow_upstream_half_close_) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this be cone in maybeendDecode so we don't have to do it in all headers/body/data cases? if maybeEndStream did checks on allow_upstream_half_close_ I think it'd simplify things?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved the flag check to checkAndCloseStreamIfFullyClosed
source/common/http/filter_manager.cc
Outdated
@@ -960,6 +972,8 @@ void DownstreamFilterManager::sendLocalReply( | |||
ASSERT(!state_.under_on_local_reply_); | |||
const bool is_head_request = state_.is_head_request_; | |||
const bool is_grpc_request = state_.is_grpc_request_; | |||
// Local reply closes the stream even if downstream is not half closed. | |||
state_.force_close_stream_ = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of setting a boolean here (which adds to the booleans we need to keep track of in all places, WDYT of adding a stopSending function which we call both on local reply and on receipt of error response, which stops downstream processing?
I think that'd then be reuseable for the "stream reset preceding data" bug as we could convert no_error stream resets into stop sending calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. I have reworked this a bit and added the stopDecoding method.
@@ -520,7 +571,7 @@ TEST_F(HttpConnectionManagerImplTest, ConnectionDuration) { | |||
EXPECT_EQ(1U, stats_.named_.downstream_cx_max_duration_reached_.value()); | |||
} | |||
|
|||
TEST_F(HttpConnectionManagerImplTest, IntermediateBufferingEarlyResponse) { | |||
TEST_F(HttpConnectionManagerImplTest, DISABLED_IntermediateBufferingEarlyResponse) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
// H/1 upstream always causes the stream to be closed if it responds before downstream. | ||
// This test also causes downstream connection to run out of stream window (in H/2 and H/3 case) | ||
// when processing END_STREAM from the server to make sure the data is not lost in the case. | ||
TEST_P(ProtocolIntegrationTest, ServerHalfCloseBeforeClientWithBufferedResponseData) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be useful to dump the upstream timing and/or warn about it in release notes as I think before if we'd get an early response onLastUpstreamTxByte might not be set.
While now it will be and it might be > onLastUpstreamRxByteReceived which might cause some manipulations of timing data that assumed last byte tx to upstream < last upstream rx byte received to become negative
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added test for duration values. %COMMON_DURATION%
may now produce negative values in more cases but I'm not too concerned about it, since independent half close support is basically a new feature. It is useful for bidi streaming in gRPC and I think people will make sure to set monitoring correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since independent half close support is basically a new feature. It is useful for bidi streaming in gRPC and I think people will make sure to set monitoring correctly.
It seems like our plan is to eventually enable it by default. In which case warning or noting it in either release notes or the docs for
%COMMON_DURATION%
will help less sophisticated users understand when this can be negative.
Signed-off-by: Yan Avlasov <[email protected]>
Signed-off-by: Yan Avlasov <[email protected]>
Signed-off-by: Yan Avlasov <[email protected]>
Signed-off-by: Yan Avlasov <[email protected]>
Signed-off-by: Yan Avlasov <[email protected]>
Sanitizer failures look related to #34354 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for fixing this complicated edge case.
// H/1 upstream always causes the stream to be closed if it responds before downstream. | ||
// This test also causes downstream connection to run out of stream window (in H/2 and H/3 case) | ||
// when processing END_STREAM from the server to make sure the data is not lost in the case. | ||
TEST_P(ProtocolIntegrationTest, ServerHalfCloseBeforeClientWithBufferedResponseData) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since independent half close support is basically a new feature. It is useful for bidi streaming in gRPC and I think people will make sure to set monitoring correctly.
It seems like our plan is to eventually enable it by default. In which case warning or noting it in either release notes or the docs for
%COMMON_DURATION%
will help less sophisticated users understand when this can be negative.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry to nitpick here but I'm really concerned about the number of extra bools and consistency problems. comments inline
/wait
source/common/http/filter_manager.h
Outdated
// end_stream flag when resuming encoder filter chain iteration. | ||
// It is only used when the | ||
// `allow_multiplexed_upstream_half_close` runtime flag is true. | ||
bool local_complete_ : 1; // This indicates that request is complete prior to filter processing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want to nitpick but is this about request completion?
looks like ignoring half close, local_complete_ is set from setLocalComplete which is set from ConnectionManagerImpl::doEndStream which is called on [stream reset or ] HCM endStream which is called from filter manager maybeEndEncode.
Isn't it really a way to say the encode has started which is used (in the non-half-close case) to neuter the complete-or-not decode side of things?
it feels like prior to any half close logic, writes to this boolean are "have we started encoding" and reads are "do we want to stop decoding" and if we replaced read calls with shouldDecode() {finished_encoding_ && !half_close} we could avoid the extra boolean
thoughts?
source/common/http/filter_manager.h
Outdated
bool remote_encode_complete_ : 1; // Set when encoder filter chain iteration has completed. All | ||
// decoder filters have completed and end_stream was observed. | ||
// TODO(yanavlasov): always use encoder_end_stream_ for tracking end_stream state during encoder | ||
// filter iteration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have concerns about having this as a follow-up. I believe you have good intent but I also have seen the number of times you've been yanked away from projects due to internal reprioritization and this code is complicated enough I'm pretty strongly averse to making it harder to understand and having a TODO to clean up :-/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Explicitly calling this out as a blocking issue.
Given the level of unresolved tech debt* I am not ok with plan "sort it out when we remove the flag"
*https://github.com/envoyproxy/envoy/issues?q=is%3Aopen+label%3A%22tech+debt%22+assignee%3Ayanavlasov+
source/common/http/filter_manager.h
Outdated
bool stream_closed_ : 1; // Set when both remote_decode_complete_ and remote_encode_complete_ is | ||
// true observed for the first time and prevents ending the stream | ||
// multiple times. Only set when allow_upstream_half_close is enabled. | ||
bool should_stop_decoding_ : 1; // Set to indicate that decoding on the stream should be stopped |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is what local complete was used for previously. And it appears to me in most of the decode calls we abort if local complete and not based on should_stop_decoding which makes me think this is not doing what it says it's doing. I think we're adding complexity and not correctness here,
source/common/http/filter_manager.h
Outdated
@@ -893,6 +916,11 @@ class FilterManager : public ScopeTrackedObject, | |||
bool decoder_filter_chain_aborted_ : 1; | |||
bool encoder_filter_chain_aborted_ : 1; | |||
bool saw_downstream_reset_ : 1; | |||
bool stream_closed_ : 1; // Set when both remote_decode_complete_ and remote_encode_complete_ is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not just use a helper function which checks both booleans?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/wait since I think that was just a main merge - lmk if incorrect
(also please be aware I'm out for a week starting mid-week next week so there's a limited window for reviews) |
source/common/http/filter_manager.h
Outdated
bool remote_encode_complete_ : 1; // Set when encoder filter chain iteration has completed. All | ||
// decoder filters have completed and end_stream was observed. | ||
// TODO(yanavlasov): always use encoder_end_stream_ for tracking end_stream state during encoder | ||
// filter iteration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Explicitly calling this out as a blocking issue.
Given the level of unresolved tech debt* I am not ok with plan "sort it out when we remove the flag"
*https://github.com/envoyproxy/envoy/issues?q=is%3Aopen+label%3A%22tech+debt%22+assignee%3Ayanavlasov+
Signed-off-by: Yan Avlasov <[email protected]>
Signed-off-by: Yan Avlasov <[email protected]>
Signed-off-by: Yan Avlasov <[email protected]>
Signed-off-by: Yan Avlasov <[email protected]>
Signed-off-by: Yan Avlasov <[email protected]>
This API was introduced by envoyproxy#32991 and it conflates support for half-close with TCP proxying mode in H/1 codec. This interferes with envoyproxy#34461 which enables support for server half-close in H/2 proxying. This PR refactors API to separate enabling of the half-close behavior from TCP proxy behavior. Prior to envoyproxy#34461 half-close semativs were only enabled for TCP proxy upstreams, but envoyproxy#34461 introduces other cases as well. This necessitates setting half-close and TCP proxy modes separately. TCP proxy mode is now enabled in the class dedicated to the TCP proxy upstreams. --------- Signed-off-by: Yan Avlasov <[email protected]> Signed-off-by: asingh-g <[email protected]>
@KBaichoo @RyanTheOptimist can you PTAL? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you again for working on such a complicated bug. Looked at the implementation again, and had a few questions. / comments
Great job naming and refactoring some of the naming / logic here.
docs/root/intro/arch_overview/http/http_connection_management.rst
Outdated
Show resolved
Hide resolved
docs/root/intro/arch_overview/http/http_connection_management.rst
Outdated
Show resolved
Hide resolved
@@ -1754,6 +1755,10 @@ void Filter::onUpstreamMetadata(Http::MetadataMapPtr&& metadata_map) { | |||
|
|||
void Filter::onUpstreamComplete(UpstreamRequest& upstream_request) { | |||
if (!downstream_end_stream_) { | |||
if (allow_multiplexed_upstream_half_close_) { | |||
// Continue request if downstream is not done yet. | |||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While we shouldn't be resetting the upstream, we do also skip over doing these various timing, outlier detection, etc. When will we end up updating them in this case when allow_multiplied_upstream_half_close
is true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Never. This code updates the response_time
which is response_complete_time - request_complete_time
. It is negative for the case where response completes before request and budget percentage computations will be off. Histograms record uint64_t so these will end up being some large values.
Since the response_time
becomes nonsensical in this case, I decided to skip recording the stats.
Signed-off-by: Yan Avlasov <[email protected]>
Blocked by the #35815 . Some tests with independent half-close enabled do not pass without it. |
/wait |
Signed-off-by: Yan Avlasov <[email protected]>
Unblocked |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGMT, modulo one whitespace nit.
Awesome comments and docs!
|
||
For HTTP/1 protocol or if independent half-close is disabled the stream is destroyed when the response | ||
is complete and reaches its end-of-stream, i.e. when trailers or the response header/body with | ||
end-stream set are received, even if the request has not yet completed. This is handled in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great docs!
// disabled, i.e. proxying is stopped as soon as encoding has finished. | ||
// TODO(yanavlasov): to support this case the codec needs to notify HCM that it has closed its low | ||
// level stream | ||
// to avoid use-after-free when HCM cleans-up its ActiveStream |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think this line wrapping got confused somehow
All requested changes were completed.
- Update `ENVOY_COMMIT` to the latest Envoy's commit. - Update affected `ENVOY_LOG` and `ENVOY_LOG_MISC` function calls to accommodate the changes in envoyproxy/envoy#35869. - Add the `Http1PrematureUpstreamHalfClose` case to `StreamDecoder::streamResetReasonToResponseFlag` to accommodate envoyproxy/envoy#34461. Signed-off-by: jiajunye <[email protected]>
Commit Message:
Allow HTTP/2 (and HTTP/3) upstream servers to half close the stream before the downstream. This enables bidirectional gRPC streams where server completes streaming before the client. Behavior of HTTP/1 or TCP proxy upstream servers is unchanged and the stream is reset if the upstream server completes response before the downstream. The stream is also reset if the upstream server responds with an error status before the downstream. This behavior is disabled by default and can be enabled by setting the
envoy.reloadable_features.allow_multiplexed_upstream_half_close
runtime key to true.Change details:
Presently there are two places where the stream was reset when upstream half-closed.
Filter::onUpstreamComplete
method, which covers HTTP upstreams.FilterManager::commonEncodePrefix
which covers local reply's by filters and TCP upstreams.When the
envoy.reloadable_features.allow_multiplexed_upstream_half_close
is enabled the router filter no longer forces reset in theFilter::onUpstreamComplete
and allows fully independent half closes. To preserve existing half close behavior of HTTP/1 protocol the force reset is added in the H/1 codec'sClientConnectionImpl::onMessageCompleteBase()
method.When the
envoy.reloadable_features.allow_multiplexed_upstream_half_close
is enabled the filter manager closes stream after both decoder and encoder filter chain complete, except in two cases:FilterManager::checkAndCloseStreamIfFullyClosed()
method.The
state_.decoder_filter_chain_complete_
is added to track completion of the decoder filter chain.To preserver behavior for TCP upstream the force reset is moved into the
TcpUpstream::onUpstreamData
method.Risk Level: High (flag protected, disabled by default)
Testing: Unit Tests
Docs Changes: No
Release Notes: Yes
Platform Specific Features: N/A
Runtime guard: envoy.reloadable_features.allow_multiplexed_upstream_half_close
Fixes #30149