Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ext_proc filter ASSERT at canIterate() when client not sending trailer #28592

Merged
merged 7 commits into from
Jul 27, 2023

Conversation

yanjunxiang-google
Copy link
Contributor

@yanjunxiang-google yanjunxiang-google commented Jul 25, 2023

ext_proc filter ASSERT at canIterate() when downstream client is not sending trailer at same time the filter is configured to send trailer to the external processing server.

The problem is that In such case, during decodeData(), ext_proc filter creates an empty trailer and adds it to the trailer map:

new_trailers = state.addTrailers();

, then returns stopIterationAndWatermark:
return FilterDataStatus::StopIterationAndWatermark;

The returned stopIteration cause filter_manager break from the loop:

Later on it call decodeTrailer() from next filter, which is router filter:

decodeTrailers(trailers_added_entry->get(), *filter_manager_callbacks_.requestTrailers());

This caused router filter calls decodeTrailer() before call decodeHeaders(). And later on when it call decodeHeaders(), it ASSERT at canIterate().

There is an issue raised for this:
#28620

Commit Message:
Additional Description:
Risk Level:
Testing:
Docs Changes:
Release Notes:
Platform Specific Features:
[Optional Runtime guard:]
[Optional Fixes #Issue]
[Optional Fixes commit #PR or SHA]
[Optional Deprecated:]
[Optional API Considerations:]

@yanjunxiang-google yanjunxiang-google marked this pull request as draft July 25, 2023 03:22
@yanjunxiang-google yanjunxiang-google changed the title Support addTrailers() when filter decodeData() at same time return st… ext_proc filter ASSERT at canIterate() when client not sending trailer Jul 25, 2023
Signed-off-by: Yanjun Xiang <[email protected]>
@yanjunxiang-google
Copy link
Contributor Author

Configuration to trigger this:

       http_filters {
          name: "envoy.filters.http.ext_proc"
          typed_config {
            [type.googleapis.com/envoy.extensions.filters.http.ext_proc.v3.ExternalProcessor] {
              grpc_service {
                envoy_grpc {
                  cluster_name: "ext_proc_server_0"
                }
                timeout {
                  seconds: 300
                }
              }
              processing_mode {
                request_header_mode: SEND
                response_header_mode: SEND
                request_body_mode: BUFFERED
                response_body_mode: BUFFERED
                request_trailer_mode: SEND
                response_trailer_mode: SEND
              }
            }
          }
        }
        http_filters {
          name: "envoy.filters.http.router"
          typed_config {
            [type.googleapis.com/envoy.extensions.filters.http.router.v3.Router] {
            }
          }
        }

@yanjunxiang-google
Copy link
Contributor Author

yanjunxiang-google commented Jul 25, 2023

The traceback:

[2023-07-24 16:49:33.519][30][trace][http] [source/common/http/filter_manager.cc:550] [Tags: "ConnectionId":"3","StreamId":"14728416450513760437"] decode headers called: filter=envoy.filters.http.router status=1
yanjun 10 commonHandleAfterHeadersCallback iteration_state_ 1
[2023-07-24 16:49:33.519][30][critical][assert] [source/common/http/filter_manager.cc:140] assert failure: canIterate().
[2023-07-24 16:49:33.519][30][critical][backtrace] [./source/server/backtrace.h:104] Caught Aborted, suspect faulting address 0xfffe00000011
[2023-07-24 16:49:33.519][30][critical][backtrace] [./source/server/backtrace.h:91] Backtrace (use tools/stack_decode.py to get line numbers):
[2023-07-24 16:49:33.519][30][critical][backtrace] [./source/server/backtrace.h:92] Envoy version: 0/1.27.0-dev/test/DEBUG/BoringSSL
[2023-07-24 16:49:33.520][20][trace][connection] [source/common/network/connection_impl.cc:362] [Tags: "ConnectionId":"7"] readDisable: disable=false disable_count=1 state=0 buffer_length=0
[2023-07-24 16:49:33.522][20][debug][http2] [source/common/http/http2/codec_impl.cc:1605] [Tags: "ConnectionId":"7"] updating connection-level initial window size to 268435456
[2023-07-24 16:49:33.522][20][trace][connection] [source/common/network/connection_impl.cc:575] [Tags: "ConnectionId":"7"] socket event: 2
[2023-07-24 16:49:33.522][20][trace][connection] [source/common/network/connection_impl.cc:686] [Tags: "ConnectionId":"7"] write ready
[2023-07-24 16:49:33.552][30][critical][backtrace] [./source/server/backtrace.h:96] #0: Envoy::SignalAction::sigHandler() [0x61412ac]
[2023-07-24 16:49:33.553][30][critical][backtrace] [./source/server/backtrace.h:96] #1: __restore_rt [0x7fcc24023420]
[2023-07-24 16:49:33.585][30][critical][backtrace] [./source/server/backtrace.h:96] #2: Envoy::Http::FilterManager::decodeHeaders() [0x53ae204]
[2023-07-24 16:49:33.616][30][critical][backtrace] [./source/server/backtrace.h:96] #3: Envoy::Http::ActiveStreamDecoderFilter::doHeaders() [0x53ac927]
[2023-07-24 16:49:33.646][30][critical][backtrace] [./source/server/backtrace.h:96] #4: Envoy::Http::ActiveStreamFilterBase::commonContinue() [0x53a87d4]
[2023-07-24 16:49:33.677][30][critical][backtrace] [./source/server/backtrace.h:96] #5: Envoy::Http::ActiveStreamDecoderFilter::continueDecoding() [0x53b50d5]
[2023-07-24 16:49:33.707][30][critical][backtrace] [./source/server/backtrace.h:96] #6: Envoy::Extensions::HttpFilters::ExternalProcessing::DecodingProcessorState::continueProcessing() [0x2d0a7b0]
[2023-07-24 16:49:33.737][30][critical][backtrace] [./source/server/backtrace.h:96] #7: Envoy::Extensions::HttpFilters::ExternalProcessing::ProcessorState::continueIfNecessary() [0x2cfdcf6]
[2023-07-24 16:49:33.767][30][critical][backtrace] [./source/server/backtrace.h:96] #8: Envoy::Extensions::HttpFilters::ExternalProcessing::ProcessorState::handleTrailersResponse() [0x2d011a6]
[2023-07-24 16:49:33.798][30][critical][backtrace] [./source/server/backtrace.h:96] #9: Envoy::Extensions::HttpFilters::ExternalProcessing::Filter::onReceiveMessage() [0x2ce445d]
[2023-07-24 16:49:33.828][30][critical][backtrace] [./source/server/backtrace.h:96] #10: Envoy::Extensions::HttpFilters::ExternalProcessing::ExternalProcessorStreamImpl::onReceiveMessage() [0x2cd5e02]
[2023-07-24 16:49:33.860][30][critical][backtrace] [./source/server/backtrace.h:96] #11: Envoy::Grpc::AsyncStreamCallbacks<>::onReceiveMessageRaw() [0x2cd6baf]
[2023-07-24 16:49:33.890][30][critical][backtrace] [./source/server/backtrace.h:96] #12: Envoy::Grpc::AsyncStreamImpl::onData() [0x46ab8ca]
[2023-07-24 16:49:33.921][30][critical][backtrace] [./source/server/backtrace.h:96] #13: Envoy::Http::AsyncStreamImpl::encodeData() [0x46bb5c4]
[2023-07-24 16:49:33.951][30][critical][backtrace] [./source/server/backtrace.h:96] #14: Envoy::Router::Filter::onUpstreamData() [0x4e67da4]
[2023-07-24 16:49:33.982][30][critical][backtrace] [./source/server/backtrace.h:96] #15: Envoy::Router::UpstreamRequest::decodeData() [0x4e9d9bf]
[2023-07-24 16:49:34.013][30][critical][backtrace] [./source/server/backtrace.h:96] #16: Envoy::Router::UpstreamRequestFilterManagerCallbacks::encodeData() [0x4eaf8b2]
[2023-07-24 16:49:34.045][30][critical][backtrace] [./source/server/backtrace.h:96] #17: Envoy::Http::FilterManager::encodeData() [0x53bb00b]
[2023-07-24 16:49:34.076][30][critical][backtrace] [./source/server/backtrace.h:96] #18: Envoy::Http::ActiveStreamDecoderFilter::encodeData() [0x53b991a]
[2023-07-24 16:49:34.106][30][critical][backtrace] [./source/server/backtrace.h:96] #19: Envoy::Router::UpstreamCodecFilter::CodecBridge::decodeData() [0x4ebbefe]
[2023-07-24 16:49:34.137][30][critical][backtrace] [./source/server/backtrace.h:96] #20: Envoy::Http::ResponseDecoderWrapper::decodeData() [0x4613a3c]
[2023-07-24 16:49:34.168][30][critical][backtrace] [./source/server/backtrace.h:96] #21: Envoy::Http::Http2::ConnectionImpl::StreamImpl::decodeData() [0x55134a8]
[2023-07-24 16:49:34.198][30][critical][backtrace] [./source/server/backtrace.h:96] #22: Envoy::Http::Http2::ConnectionImpl::onFrameReceived() [0x5527679]
[2023-07-24 16:49:34.228][30][critical][backtrace] [./source/server/backtrace.h:96] #23: Envoy::Http::Http2::ConnectionImpl::Http2Callbacks::Http2Callbacks()::$_24::operator()() [0x553735d]
[2023-07-24 16:49:34.259][30][critical][backtrace] [./source/server/backtrace.h:96] #24: Envoy::Http::Http2::ConnectionImpl::Http2Callbacks::Http2Callbacks()::$_24::__invoke() [0x5537325]
[2023-07-24 16:49:34.290][30][critical][backtrace] [./source/server/backtrace.h:96] #25: http2::adapter::CallbackVisitor::OnDataForStream() [0x5f07eec]
[2023-07-24 16:49:34.320][30][critical][backtrace] [./source/server/backtrace.h:96] #26: http2::adapter::callbacks::OnDataChunk() [0x5ed6e00]
[2023-07-24 16:49:34.351][30][critical][backtrace] [./source/server/backtrace.h:96] #27: nghttp2_session_mem_recv [0x5f19a45]
[2023-07-24 16:49:34.382][30][critical][backtrace] [./source/server/backtrace.h:96] #28: http2::adapter::NgHttp2Session::ProcessBytes() [0x5ed42ab]
[2023-07-24 16:49:34.412][30][critical][backtrace] [./source/server/backtrace.h:96] #29: http2::adapter::NgHttp2Adapter::ProcessBytes() [0x5ecd476]
[2023-07-24 16:49:34.443][30][critical][backtrace] [./source/server/backtrace.h:96] #30: Envoy::Http::Http2::ConnectionImpl::dispatch() [0x552198d]

@yanjunxiang-google
Copy link
Contributor Author

The related logs shows the problem:

Tags: "ConnectionId":"3","StreamId":"14728416450513760437"] request end stream
[2023-07-24 16:49:33.490][30][trace][misc] [source/common/event/scaled_range_timer_manager_impl.cc:60] enableTimer called on 0x36dc7f720700 for 300000ms, min is 300000ms
[2023-07-24 16:49:33.490][30][trace][ext_proc] [source/extensions/filters/http/ext_proc/ext_proc.cc:467] decodeData(10): end_stream = true
[2023-07-24 16:49:33.490][30][trace][ext_proc] [source/extensions/filters/http/ext_proc/ext_proc.cc:273] Creating new, empty trailers
[2023-07-24 16:49:33.490][30][trace][ext_proc] [source/extensions/filters/http/ext_proc/ext_proc.cc:279] Header processing still in progress -- holding body data
[2023-07-24 16:49:33.490][30][debug][ext_proc] [source/extensions/filters/http/ext_proc/processor_state.cc:409] Watermark raised on decoding
[2023-07-24 16:49:33.491][30][debug][http] [source/common/http/conn_manager_impl.cc:1778] [Tags: "ConnectionId":"3","StreamId":"14728416450513760437"] Read-disabling downstream stream due to filter callbacks.
[2023-07-24 16:49:33.491][30][debug][http2] [source/common/http/http2/codec_impl.cc:416] [Tags: "ConnectionId":"3"] Stream 1 disabled, unconsumed_bytes 0 read_disable_count 0
[2023-07-24 16:49:33.491][30][trace][ext_proc] [source/extensions/filters/http/ext_proc/ext_proc.cc:469] decodeData returning 2
[2023-07-24 16:49:33.491][30][trace][http] [source/common/http/filter_manager.cc:682] [Tags: "ConnectionId":"3","StreamId":"14728416450513760437"] decode data called: filter=envoy.filters.http.ext_proc status=2
[2023-07-24 16:49:33.491][30][debug][router] [source/common/router/router.cc:874] [Tags: "ConnectionId":"3","StreamId":"14728416450513760437"] router decoding trailers:
[2023-07-24 16:49:33.491][30][trace][http] [source/common/http/filter_manager.cc:772] [Tags: "ConnectionId":"3","StreamId":"14728416450513760437"] decode trailers called: filter=envoy.filters.http.router status=1

...

[2023-07-24 16:49:33.519][30][trace][http] [source/common/http/filter_manager.cc:550] [Tags: "ConnectionId":"3","StreamId":"14728416450513760437"] decode headers called: filter=envoy.filters.http.router status=1
[2023-07-24 16:49:33.519][30][critical][assert] [source/common/http/filter_manager.cc:140] assert failure: canIterate().
[2023-07-24 16:49:33.519][30][critical][backtrace] [./source/server/backtrace.h:104] Caught Aborted, suspect faulting address 0xfffe00000011

@yanjunxiang-google
Copy link
Contributor Author

yanjunxiang-google commented Jul 25, 2023

There is an issue raised for this:
#28620

The solution is to follow what's proposed in the issue to just remove the ext_proc filter logic which construct and send empty trailer if client is not sending trailer at same time ext_proc filter is configure to send trailers to the external processing server.

@phlax
Copy link
Member

phlax commented Jul 25, 2023

please hold off pushing CI for the next few hours while we push through release CI

@yanjunxiang-google
Copy link
Contributor Author

yanjunxiang-google commented Jul 25, 2023

The issue can be simply reproduced by the added ext_proc integration test case in this PR, which test this in the request_body path.

It can also be reproduced at the response_body path by below change.

diff --git a/test/extensions/filters/http/ext_proc/ext_proc_integration_test.cc b/test/extensions/filters/http/ext_proc/ext_proc_integration_test.cc
index 762606cca9..e70e6418bd 100644
--- a/test/extensions/filters/http/ext_proc/ext_proc_integration_test.cc
+++ b/test/extensions/filters/http/ext_proc/ext_proc_integration_test.cc
@@ -1065,13 +1065,15 @@ TEST_P(ExtProcIntegrationTest, AddTrailersOnResponse) {
// Test the filter using a configuration that processes response trailers, and process an upstream
// response that has no trailers, but when the time comes, don't actually add anything.
TEST_P(ExtProcIntegrationTest, AddTrailersOnResponseJustKidding) {
++ proto_config_.mutable_processing_mode()->set_response_header_mode(ProcessingMode::SKIP);
++ proto_config_.mutable_processing_mode()->set_response_body_mode(ProcessingMode::BUFFERED);
proto_config_.mutable_processing_mode()->set_response_trailer_mode(ProcessingMode::SEND);
initializeConfig();
HttpIntegrationTest::initialize();
auto response = sendDownstreamRequest(absl::nullopt);
processRequestHeadersMessage(*grpc_upstreams_[0], true, absl::nullopt);
handleUpstreamRequest();
-- processResponseHeadersMessage(*grpc_upstreams_[0], false, absl::nullopt);
++ processResponseBodyMessage(*grpc_upstreams_[0], false, absl::nullopt);
processResponseTrailersMessage(*grpc_upstreams_[0], false, absl::nullopt);

verifyDownstreamResponse(*response, 200);

Below are the logs, which you can see the spurious message received since ext_proc filter is sending trailer to ext_proc server right after it sends the body to the ext_proc server. This causes the ext_proc state machine broken since it is now waiting for trailer response, but ext_proc server may sends back the body response first. ext_proc state machine requires sequentially message sending.

[2023-07-25 18:24:56.541][3802][trace][ext_proc] [source/extensions/filters/http/ext_proc/ext_proc.cc:557] encodeData(100): end_stream = true
[2023-07-25 18:24:56.541][3802][trace][ext_proc] [source/extensions/filters/http/ext_proc/ext_proc.cc:273] Creating new, empty trailers
[2023-07-25 18:24:56.541][3802][debug][ext_proc] [source/extensions/filters/http/ext_proc/ext_proc.cc:322] Sending request body message
[2023-07-25 18:24:56.541][3802][debug][ext_proc] [source/extensions/filters/http/ext_proc/ext_proc.cc:572] Sending a body chunk of 100 bytes
[2023-07-25 18:24:56.541][3802][debug][ext_proc] [source/extensions/filters/http/ext_proc/processor_state.cc:30] Traffic direction OUTBOUND: 200 ms timer enabled
[2023-07-25 18:24:56.542][3802][debug][ext_proc] [source/extensions/filters/http/ext_proc/processor_state.cc:30] Traffic direction OUTBOUND: 200 ms timer enabled
[2023-07-25 18:24:56.542][3802][debug][ext_proc] [source/extensions/filters/http/ext_proc/ext_proc.cc:601] Sending trailers message
[2023-07-25 18:24:56.542][3802][trace][ext_proc] [source/extensions/filters/http/ext_proc/ext_proc.cc:559] encodeData returning 1
[2023-07-25 18:24:56.542][3802][trace][http] [source/common/http/filter_manager.cc:1343] [Tags: "ConnectionId":"1316","StreamId":"4316459117058928206"] encode data called: filter=envoy.filters.http.ext_proc status=1
[2023-07-25 18:24:56.542][16][debug][testing] [./test/integration/fake_upstream.h:181] Received gRPC message: response_body {
body: "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
end_of_stream: true
}

[2023-07-25 18:24:56.542][16][debug][testing] [./test/integration/fake_upstream.h:170] Sent gRPC message: response_body {
}

[2023-07-25 18:24:56.542][16][debug][testing] [./test/integration/fake_upstream.h:189] Waiting for gRPC message...
[2023-07-25 18:24:56.542][3802][debug][ext_proc] [source/extensions/filters/http/ext_proc/ext_proc.cc:660] Received response body response
[2023-07-25 18:24:56.542][3802][warning][ext_proc] [source/extensions/filters/http/ext_proc/ext_proc.cc:709] Spurious response message 4 received on gRPC stream

<<<<<<<<<<<<<< ******* This is wrong. Sending trailers right after sending body in onData() causing ext_prco filter state machine issue: it transitioned into wait for trailer response state when the body response came back.
**************>>>>>>>>>>

[2023-07-25 18:24:56.543][3802][debug][ext_proc] [source/extensions/filters/http/ext_proc/ext_proc.cc:188] Calling close on stream
[2023-07-25 18:24:56.543][3802][debug][ext_proc] [source/extensions/filters/http/ext_proc/client_impl.cc:58] Closing gRPC stream
[2023-07-25 18:24:56.543][16][debug][testing] [./test/integration/fake_upstream.h:181] Received gRPC message: response_trailers {
trailers {
}
}

[2023-07-25 18:24:56.543][16][debug][testing] [./test/integration/fake_upstream.h:170] Sent gRPC message: response_trailers {
}

Signed-off-by: Yanjun Xiang <[email protected]>
Signed-off-by: Yanjun Xiang <[email protected]>
@yanjunxiang-google yanjunxiang-google marked this pull request as ready for review July 26, 2023 18:32
@yanjunxiang-google
Copy link
Contributor Author

/assign @tyxia @yanavlasov @stevenzzzz

@yanjunxiang-google
Copy link
Contributor Author

/retest

@yanjunxiang-google
Copy link
Contributor Author

/assign @mattklein123

Copy link
Member

@tyxia tyxia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall approach make sense to me. This edge case (that is not well supported) requires careful evaluation and handling of interaction between two state machine (filter chain and ext_proc).

I am wondering do we need to change ext_proc's doc/ api proto related to trailer mode to mention this behavior change?

@yanavlasov yanavlasov merged commit a03fbb0 into envoyproxy:main Jul 27, 2023
@yanjunxiang-google
Copy link
Contributor Author

yanjunxiang-google commented Jul 27, 2023

Thanks @yanavlasov for approval.

@tyxia That's a good question. I searched ext_proc API proto and all the related docs, one place which mentioned this is below:
https://www.envoyproxy.io/docs/envoy/latest/api-v3/extensions/filters/http/ext_proc/v3/ext_proc.proto#envoy-v3-api-msg-extensions-filters-http-ext-proc-v3-externalprocessor:~:text=Request%20trailers%3A%20Delivered%20if%20they%20are%20present%20and%20if%20the%20trailer%20mode%20is%20set%20to%20SEND.

"
Request trailers: Delivered if they are present and if the trailer mode is set to SEND.
Response trailers: Delivered according to the processing mode like the request trailers.
"

This doc states we should only deliver if trailers are present and trailer mode is SEND, which matches what's done in this PR. I didn't find any doc or API stating envoy need to send an empty trailer if trailer is not present and ext_proc filter processing mode is configured to SEND trailers.

@tyxia
Copy link
Member

tyxia commented Jul 27, 2023

@yanjunxiang-google No worries, it is fine if this disabled behavior was not documented before.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants