Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTP health checker: handle GOAWAY from HTTP2 upstreams #13599

Merged
merged 51 commits into from
Jan 12, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
51 commits
Select commit Hold shift + click to select a range
ce00ea3
HTTP health checker: handle GOAWAY from HTTP2 upstreams
mpuncel Oct 14, 2020
9f83533
fix format of version history to be compliant with check_format.py
mpuncel Oct 15, 2020
736d0ca
fix spelling mistake in comment
mpuncel Oct 15, 2020
f3e9da2
fix incorrect testcomment
mpuncel Oct 16, 2020
b76c46e
Merge branch 'master' into mpuncel/http2-hc-goaway
mpuncel Oct 17, 2020
01fa365
fix alphabetical order in version.rst
mpuncel Oct 19, 2020
cd06c28
Merge branch 'master' into mpuncel/http2-hc-goaway
mpuncel Oct 21, 2020
f899f0a
Add runtime disable option to go back to old GOAWAY behavior
mpuncel Oct 21, 2020
8f21af4
follow runtime naming convention, make new flag default to true, remo…
mpuncel Oct 22, 2020
c23a9a5
document runtime guard in release notes
mpuncel Oct 22, 2020
447d07c
group common mock expectations into helper methods
mpuncel Oct 22, 2020
04458ab
rename received_no_error_goaway_ to reuse_connection_, and consolidat…
mpuncel Oct 22, 2020
0d57547
Merge branch 'master' into mpuncel/http2-hc-goaway
mpuncel Oct 29, 2020
15d4dcd
use runtimeFeatureEnabled() and don't close() twice after reset stream
mpuncel Oct 30, 2020
00503eb
add some integration tests covering GOAWAY
mpuncel Oct 30, 2020
bbd6e08
fix incorrect comments
mpuncel Oct 30, 2020
bb0cc29
Merge branch 'master' into mpuncel/http2-hc-goaway
mpuncel Oct 30, 2020
2b00126
address PR feedback
mpuncel Nov 10, 2020
2352719
attempt, no goaway in fake upstream
mpuncel Nov 11, 2020
ed7a0f1
Merge branch 'master' into HEAD
mpuncel Nov 18, 2020
38cc1bf
various fixes
mpuncel Nov 18, 2020
0bd20e6
make integration test send actual GOAWAY frames
mpuncel Nov 18, 2020
38c4fb8
undo change to workflow
mpuncel Nov 18, 2020
b4496a9
fix github workflow
mpuncel Nov 18, 2020
6dd4400
remove commented line
mpuncel Nov 18, 2020
41a330b
address PR feedback
mpuncel Nov 18, 2020
bbb5b8e
wip amend
mpuncel Dec 1, 2020
8cd1560
Merge branch 'master' into mpuncel/http2-hc-goaway
mpuncel Dec 2, 2020
ec669f2
move protocolErrorForTest to the non-legacy http2 codec
mpuncel Dec 2, 2020
455034a
fix compilation error in health_check_fuzz_test_utils
mpuncel Dec 2, 2020
e75b30a
fix compilation errors in health checker unit test
mpuncel Dec 2, 2020
df8f734
add protocolErrorForTest to the legacy http2 impl
mpuncel Dec 2, 2020
56efbce
undo changes to nghttp2 user callback
mpuncel Dec 8, 2020
2b0d930
Merge branch 'master' into mpuncel/http2-hc-goaway
mpuncel Dec 8, 2020
9e339ee
address PR feedback
mpuncel Dec 9, 2020
0345dba
add unit test for protocolErrorForTest() on the non-legacy codec impl
mpuncel Dec 9, 2020
06652ed
fix protocolErrorForTest() unit test when legacy codec impl is used
mpuncel Dec 10, 2020
322e564
remove some unintended newlines
mpuncel Dec 10, 2020
deb3893
add TODO comment to clean up fake upstream when legacy h2 codec is re…
mpuncel Dec 10, 2020
3ff4d9f
Merge branch 'master' into mpuncel/http2-hc-goaway
mpuncel Dec 10, 2020
bcb7346
add goaway_error stat too health checker, wait on it in integration t…
mpuncel Dec 11, 2020
f74ed68
fix wording of goaway_error doc
mpuncel Dec 11, 2020
1c287c1
commit stat increment line
mpuncel Dec 11, 2020
49e689d
remove unused lambda capture variable
mpuncel Dec 11, 2020
a7cfa59
PR feedback: remove new stat, use real sleep in integration test
mpuncel Dec 16, 2020
d15bbf3
Merge branch 'master' into mpuncel/http2-hc-goaway
mpuncel Dec 16, 2020
31c0641
drop changes accommodating legacy h2 codec, it was removed
mpuncel Dec 16, 2020
5e8b981
Merge branch 'master' into mpuncel/http2-hc-goaway
mpuncel Dec 22, 2020
6a3ad8c
switch to real time
mpuncel Dec 22, 2020
eacd1f4
Merge branch 'master' into mpuncel/http2-hc-goaway
mpuncel Jan 8, 2021
64cc217
Merge branch 'master' into mpuncel/http2-hc-goaway
mpuncel Jan 12, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/root/version_history/current.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,8 @@ Bug Fixes
---------
*Changes expected to improve the state of the world and are unlikely to have negative effects*

* active http health checks: properly handles HTTP/2 GOAWAY frames from the upstream. Previously a GOAWAY frame due to a graceful listener drain could cause improper failed health checks due to streams being refused by the upstream on a connection that is going away. To revert to old GOAWAY handling behavior, set the runtime feature `envoy.reloadable_features.health_check.graceful_goaway_handling` to false.

Removed Config or Runtime
-------------------------
*Normally occurs at the end of the* :ref:`deprecation period <deprecated>`
Expand Down
9 changes: 9 additions & 0 deletions source/common/http/http2/codec_impl.cc
Original file line number Diff line number Diff line change
Expand Up @@ -716,6 +716,15 @@ void ConnectionImpl::shutdownNotice() {
}
}

Status ConnectionImpl::protocolErrorForTest() {
int rc = nghttp2_submit_goaway(session_, NGHTTP2_FLAG_NONE,
nghttp2_session_get_last_proc_stream_id(session_),
NGHTTP2_PROTOCOL_ERROR, nullptr, 0);
ASSERT(rc == 0);

return sendPendingFrames();
}

Status ConnectionImpl::onBeforeFrameReceived(const nghttp2_frame_hd* hd) {
ENVOY_CONN_LOG(trace, "about to recv frame type={}, flags={}", connection_,
static_cast<uint64_t>(hd->type), static_cast<uint64_t>(hd->flags));
Expand Down
1 change: 1 addition & 0 deletions source/common/http/http2/codec_impl.h
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,7 @@ class ConnectionImpl : public virtual Connection, protected Logger::Loggable<Log
void goAway() override;
Protocol protocol() override { return Protocol::Http2; }
void shutdownNotice() override;
Status protocolErrorForTest(); // Used in tests to simulate errors.
bool wantsToWrite() override { return nghttp2_session_want_write(session_); }
// Propagate network connection watermark events to each stream on the connection.
void onUnderlyingConnectionAboveWriteBufferHighWatermark() override {
Expand Down
1 change: 1 addition & 0 deletions source/common/runtime/runtime_features.cc
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,7 @@ constexpr const char* runtime_features[] = {
"envoy.reloadable_features.fix_wildcard_matching",
"envoy.reloadable_features.fixed_connection_close",
"envoy.reloadable_features.hcm_stream_error_on_invalid_message",
"envoy.reloadable_features.health_check.graceful_goaway_handling",
"envoy.reloadable_features.http_default_alpn",
"envoy.reloadable_features.http_match_on_all_headers",
"envoy.reloadable_features.http_set_copy_replace_all_headers",
Expand Down
48 changes: 45 additions & 3 deletions source/common/upstream/health_checker_impl.cc
Original file line number Diff line number Diff line change
Expand Up @@ -260,11 +260,14 @@ void HttpHealthCheckerImpl::HttpActiveHealthCheckSession::onInterval() {
parent_.transportSocketMatchMetadata().get());
client_.reset(parent_.createCodecClient(conn));
client_->addConnectionCallbacks(connection_callback_impl_);
client_->setCodecConnectionCallbacks(http_connection_callback_impl_);
expect_reset_ = false;
reuse_connection_ = parent_.reuse_connection_;
}

Http::RequestEncoder* request_encoder = &client_->newStream(*this);
request_encoder->getStream().addCallbacks(*this);
request_in_flight_ = true;

const auto request_headers = Http::createHeaderMap<Http::RequestHeaderMapImpl>(
{{Http::Headers::get().Method, "GET"},
Expand All @@ -284,15 +287,51 @@ void HttpHealthCheckerImpl::HttpActiveHealthCheckSession::onInterval() {

void HttpHealthCheckerImpl::HttpActiveHealthCheckSession::onResetStream(Http::StreamResetReason,
absl::string_view) {
request_in_flight_ = false;
ENVOY_CONN_LOG(debug, "connection/stream error health_flags={}", *client_,
HostUtility::healthFlagsToString(*host_));
if (expect_reset_) {
return;
}

ENVOY_CONN_LOG(debug, "connection/stream error health_flags={}", *client_,
HostUtility::healthFlagsToString(*host_));
if (client_ && !reuse_connection_) {
client_->close();
}

handleFailure(envoy::data::core::v3::NETWORK);
}

void HttpHealthCheckerImpl::HttpActiveHealthCheckSession::onGoAway(
Http::GoAwayErrorCode error_code) {
ENVOY_CONN_LOG(debug, "connection going away goaway_code={}, health_flags={}", *client_,
error_code, HostUtility::healthFlagsToString(*host_));

// Runtime guard around graceful handling of NO_ERROR GOAWAY handling. The old behavior is to
// ignore GOAWAY completely.
if (!parent_.runtime_.snapshot().runtimeFeatureEnabled(
"envoy.reloadable_features.health_check.graceful_goaway_handling")) {
return;
}

if (request_in_flight_ && error_code == Http::GoAwayErrorCode::NoError) {
// The server is starting a graceful shutdown. Allow the in flight request
// to finish without treating this as a health check error, and then
// reconnect.
reuse_connection_ = false;
return;
}

if (request_in_flight_) {
// Record this as a failed health check.
handleFailure(envoy::data::core::v3::NETWORK);
}

if (client_) {
expect_reset_ = true;
mpuncel marked this conversation as resolved.
Show resolved Hide resolved
client_->close();
}
}

HttpHealthCheckerImpl::HttpActiveHealthCheckSession::HealthCheckResult
HttpHealthCheckerImpl::HttpActiveHealthCheckSession::healthCheckResult() {
uint64_t response_code = Http::Utility::getResponseStatus(*response_headers_);
Expand Down Expand Up @@ -323,6 +362,8 @@ HttpHealthCheckerImpl::HttpActiveHealthCheckSession::healthCheckResult() {
}

void HttpHealthCheckerImpl::HttpActiveHealthCheckSession::onResponseComplete() {
request_in_flight_ = false;

switch (healthCheckResult()) {
case HealthCheckResult::Succeeded:
handleSuccess(false);
Expand Down Expand Up @@ -350,7 +391,7 @@ bool HttpHealthCheckerImpl::HttpActiveHealthCheckSession::shouldClose() const {
return false;
}

if (!parent_.reuse_connection_) {
if (!reuse_connection_) {
return true;
}

Expand Down Expand Up @@ -380,6 +421,7 @@ bool HttpHealthCheckerImpl::HttpActiveHealthCheckSession::shouldClose() const {
}

void HttpHealthCheckerImpl::HttpActiveHealthCheckSession::onTimeout() {
request_in_flight_ = false;
if (client_) {
host_->setActiveHealthFailureType(Host::ActiveHealthFailureType::TIMEOUT);
ENVOY_CONN_LOG(debug, "connection/stream timeout health_flags={}", *client_,
Expand Down
14 changes: 14 additions & 0 deletions source/common/upstream/health_checker_impl.h
Original file line number Diff line number Diff line change
Expand Up @@ -106,6 +106,7 @@ class HttpHealthCheckerImpl : public HealthCheckerImplBase {
void onBelowWriteBufferLowWatermark() override {}

void onEvent(Network::ConnectionEvent event);
void onGoAway(Http::GoAwayErrorCode error_code);

class ConnectionCallbackImpl : public Network::ConnectionCallbacks {
public:
Expand All @@ -119,14 +120,27 @@ class HttpHealthCheckerImpl : public HealthCheckerImplBase {
HttpActiveHealthCheckSession& parent_;
};

class HttpConnectionCallbackImpl : public Http::ConnectionCallbacks {
public:
HttpConnectionCallbackImpl(HttpActiveHealthCheckSession& parent) : parent_(parent) {}
// Http::ConnectionCallbacks
void onGoAway(Http::GoAwayErrorCode error_code) override { parent_.onGoAway(error_code); }

private:
HttpActiveHealthCheckSession& parent_;
};

ConnectionCallbackImpl connection_callback_impl_{*this};
HttpConnectionCallbackImpl http_connection_callback_impl_{*this};
HttpHealthCheckerImpl& parent_;
Http::CodecClientPtr client_;
Http::ResponseHeaderMapPtr response_headers_;
const std::string& hostname_;
const Http::Protocol protocol_;
Network::SocketAddressProviderSharedPtr local_address_provider_;
bool expect_reset_{};
bool reuse_connection_ = false;
bool request_in_flight_ = false;
};

using HttpActiveHealthCheckSessionPtr = std::unique_ptr<HttpActiveHealthCheckSession>;
Expand Down
18 changes: 18 additions & 0 deletions test/common/http/http2/codec_impl_test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -349,6 +349,24 @@ TEST_P(Http2CodecImplTest, ShutdownNotice) {
response_encoder_->encodeHeaders(response_headers, true);
}

TEST_P(Http2CodecImplTest, ProtocolErrorForTest) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you cover this now in the integration test? Do we need this? Then the hack with the dynamic cast is only in one place.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I needed it because the code coverage for source/common/http/http2 dropped too low without it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I see. Seems like you cover this in the integration test just the same but no big deal. Please put the same TODO below about deleting the legacy codec part.

initialize();
EXPECT_EQ(absl::nullopt, request_encoder_->http1StreamEncoderOptions());

TestRequestHeaderMapImpl request_headers;
HttpTestUtility::addDefaultHeaders(request_headers);
EXPECT_CALL(request_decoder_, decodeHeaders_(_, true));
EXPECT_TRUE(request_encoder_->encodeHeaders(request_headers, true).ok());

EXPECT_CALL(client_callbacks_, onGoAway(Http::GoAwayErrorCode::Other));

// We have to dynamic cast because protocolErrorForTest() is intentionally not on the
// Connection API.
ServerConnectionImpl* raw_server = dynamic_cast<ServerConnectionImpl*>(server_.get());
ASSERT(raw_server != nullptr);
EXPECT_EQ(StatusCode::CodecProtocolError, getStatusCode(raw_server->protocolErrorForTest()));
}

// 100 response followed by 200 results in a [decode100ContinueHeaders, decodeHeaders] sequence.
TEST_P(Http2CodecImplTest, ContinueHeaders) {
initialize();
Expand Down
3 changes: 2 additions & 1 deletion test/common/upstream/health_check_fuzz_test_utils.cc
Original file line number Diff line number Diff line change
Expand Up @@ -51,11 +51,12 @@ void HttpHealthCheckerImplTestBase::expectClientCreate(
std::shared_ptr<Upstream::MockClusterInfo> cluster{
new NiceMock<Upstream::MockClusterInfo>()};
Event::MockDispatcher dispatcher_;
return new CodecClientForTest(
test_session.codec_client_ = new CodecClientForTest(
Http::CodecClient::Type::HTTP1, std::move(conn_data.connection_),
test_session.codec_, nullptr,
Upstream::makeTestHost(cluster, "tcp://127.0.0.1:9000", dispatcher_.timeSource()),
dispatcher_);
return test_session.codec_client_;
}));
}

Expand Down
1 change: 1 addition & 0 deletions test/common/upstream/health_check_fuzz_test_utils.h
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@ class HttpHealthCheckerImplTestBase : public HealthCheckerTestBase {
Network::MockClientConnection* client_connection_{};
NiceMock<Http::MockRequestEncoder> request_encoder_;
Http::ResponseDecoder* stream_response_callbacks_{};
CodecClientForTest* codec_client_{};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these changes needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I have to capture the codec_client_ because that's what I call raiseGoAway(_) on

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see sorry didn't realize this was used elsewhere.

};

using TestSessionPtr = std::unique_ptr<TestSession>;
Expand Down
Loading