http: alpn upstream #13922

alyssawilk · 2020-11-05T22:21:21Z

Allows configuring both HTTP/1 and HTTP/2 options for a cluster, which triggers TLS alpn negotiation.

Risk Level: Medium (a bunch of little refactors)
Testing: integration tests. NEEDS UNIT TESTS
Docs Changes: n.a
Release Notes: inline
Fixes #3431

Signed-off-by: Alyssa Wilk <[email protected]>

repokitteh-read-only · 2020-11-05T22:21:29Z

CC @envoyproxy/api-shepherds: Your approval is needed for changes made to (api/envoy[\w/]*/(v1alpha\d?|v1|v2alpha\d?|v2))|(api/envoy/type/(matcher/)?\w+.proto).
CC @envoyproxy/api-shepherds: Your approval is needed for changes made to api/envoy/.
CC @envoyproxy/api-watchers: FYI only for changes made to api/envoy/.

🐱

Caused by: #13922 was opened by alyssawilk.

see: more, trace.

alyssawilk · 2020-11-05T22:22:28Z

Unit tests are expected to fail - there's some mock clean up around timers and connection.connected() to do, but it's in pretty good shape stability-wise (beyond the new integration test, I've run the full http2_upstream_integration_test and http_integration_test with alpn forced true and they pass)

I wanted to get include-API thumbs up before I spent too much time reordering mocks and writing the new nuit tests.

Signed-off-by: Alyssa Wilk <[email protected]>

alyssawilk · 2020-11-09T15:11:19Z

Whoops, meant to cc @mattklein123 and @snowp last week. Still got 1 of 2, not bad :-P

Signed-off-by: Alyssa Wilk <[email protected]>

mattklein123

Thanks for working on this long requested feature. I have some high level API/usability comments to get started. I have major regrets about how upstream h2 is configured and I fear this is only going to dig the whole deeper so want to discuss if there are other options we should consider.

/wait-any

api/envoy/config/cluster/v3/cluster.proto

alyssawilk · 2020-11-11T14:29:42Z

api/envoy/config/cluster/v3/cluster.proto

+    // configured, Envoy will attempt to do ALPN negotiation for TLS connections, failing
+    // over to HTTP/1.1 if ALPN negotiation fails.
+    // If only one protocol option is present it will be used as the hard-coded
+    // protocol. If neither is present, HTTP/1.1 will be used.


Somewhat orthogonal, while talking config though, I was of the mind to refuse to allow the new config unless TLS was explicitly configured - you can't do ALPN without TLS and given the ALPN pool "needs" to fail over to HTTP/1 I think it'd be easy to accidentally configure ALPN, forget TLS, and get locked into HTTP/1

We can't require TLS though, because there's other ALPN (say ALTS ALPN), which we use internally.
I was thinking we could make transport sockets register if they do ALPN, and reject config which enables H1/H2 without ALPN. That doesn't extend well to HTTP/3 which AFIK requires TLS/ALTS but doesn't actually do ALPN.
Worst case we could just comment a warning, and increment a stat of ALPN fails, but I'm wondering if you have ideas to make borked configs more obvious here.

See my comment above. This is what I was going to suggest and I think we should do this.

Signed-off-by: Alyssa Wilk <[email protected]>

mattklein123

Yeah this LGTM at a high level. It's going to need a ton of fixes around docs. We need to be super clear on how to configure things for the average user, so we will need to update all the sandboxes, probably ref link from the typed_protocol_config field, etc.

But if you are up for all of this, I think this will be a massive improvement. Thank you!

/wait

api/envoy/extensions/filters/network/http_connection_manager/v3/http_connection_manager.proto

mattklein123 · 2020-11-12T20:06:01Z

api/envoy/extensions/filters/network/http_connection_manager/v3/http_connection_manager.proto

+
+  config.core.v3.UpstreamHttpProtocolOptions upstream_http_protocol_options = 2;
+
+  oneof upstream_protocol_options {


I think this one can have a required validation.

alyssawilk · 2020-11-12T20:37:10Z

Ok, I'll go beat the config loading into shape, and see how much of the config the deprecated CI gets, then ping for another pass. So waiting w.r.t Matt, but I would appreciate Snow's toughts w.r.t. alpn decoration. I'm not sure of the motivation for explicitly configured transport ALPN vs cluster failover ALPN so I'm not sure how to address Matt's question there.

Signed-off-by: Alyssa Wilk <[email protected]>

alyssawilk · 2020-12-08T15:05:42Z

new conflicts are just two build file additions and two include additions. Will push another update but it should have no impact on tests.

Signed-off-by: Alyssa Wilk <[email protected]>

mattklein123

Very cool. Flushing out some comments to get started. Thank you!

/wait

mattklein123 · 2020-12-08T17:57:33Z

api/envoy/extensions/upstreams/http/v3/http_protocol_options.proto

+  // protocol is negotiated by ALPN with the upstream.
+  // Clusters configured with *AutoHttpConfig* will use the highest available
+  // protocol; HTTP/2 if supported, otherwise HTTP/1.
+  // If the upstream does not support ALPN, *AutoHttpConfig* will will fail over to HTTP/1.


Suggested change

// If the upstream does not support ALPN, *AutoHttpConfig* will will fail over to HTTP/1.

// If the upstream does not support ALPN, *AutoHttpConfig* will fail over to HTTP/1.

mattklein123 · 2020-12-08T17:58:53Z

docs/root/version_history/current.rst

@@ -64,6 +64,7 @@ New Features
 * health_check: added option to use :ref:`no_traffic_healthy_interval <envoy_v3_api_field_config.core.v3.HealthCheck.no_traffic_healthy_interval>` which allows a different no traffic interval when the host is healthy.
 * http: added HCM :ref:`timeout config field <envoy_v3_api_field_extensions.filters.network.http_connection_manager.v3.HttpConnectionManager.request_headers_timeout>` to control how long a downstream has to finish sending headers before the stream is cancelled.
 * http: added frame flood and abuse checks to the upstream HTTP/2 codec. This check is off by default and can be enabled by setting the `envoy.reloadable_features.upstream_http2_flood_checks` runtime key to true.
+* http: alpn is now supported upstream, configurable via `alpn_config <envoy_v3_api_field_extensions.upstreams.http.v3.HttpProtocolOptions.alpn_config>` in the :ref:`http_protocol_options <envoy_v3_api_msg_extensions.upstreams.http.v3.HttpProtocolOptions>` message.


nit: alpn was already supported upstream, just manually configured. Rephrase slightly? Also, I think the field names here are out of date so I think you are also missing a :ref.

mattklein123 · 2020-12-08T17:59:55Z

include/envoy/upstream/upstream.h

@@ -711,6 +711,10 @@ class ClusterInfo {
    static const uint64_t USE_DOWNSTREAM_PROTOCOL = 0x2;
    // Whether connections should be immediately closed upon health failure.
    static const uint64_t CLOSE_CONNECTIONS_ON_HOST_HEALTH_FAILURE = 0x4;
+    // If HTTP2 is true, the upstream protocol will be negotiated using ALPN.


Why is this true? Don't we still support H2 with prior knowledge explicitly configured?

source/common/conn_pool/conn_pool_base.h

source/common/http/codec_client.cc

mattklein123 · 2020-12-08T18:05:33Z

source/common/http/mixed_conn_pool.cc

+  // When we upgrade from a TCP client to non-TCP we get a spurious onConnected
+  // from the new client. Ignore that.


This seems broken. TODO somewhere to not raise onConnected() multiple times?

What happens here is that the client is subscribed to network callbacks. When under the call stack of the network raising the connection event, we detach the TCP client from the network callbacks, and attach the HTTP client.

I don't think it's safe to delay subscribing the HTTP client to the network callbacks until not under the stack of raising the connected event. We could avoid the detach and reattach by having a shim event handler in the connection class which originally pass onEvent to the TCP client, and is std::moved over to pass onEvent to the HTTP client. But the problem with that is that for TCP, the connection pool client subscribes directly to the network's callbacks. For HTTP, the connection pool client asks the codec to add it to network callbacks, and doesn't do so to the connection directly. Today that's the same code, but if we decide to do custom event work in the codec client, we could end up introducing some really weird bugs.

Long story short I actually think this is cleaner than any other option. If I can sell you on that I'd be happy to add inline comments on why it's safer than other options. Otherwise I'd prefer nailing down something which isn't uglier before I add a TODO :-)

Sorry are you saying that because we are under the same call stack the event gets added and then it gets iterated to and called? I'm surprised that actually works but I forget what data structure is used. I believe you that this is the cleanest way so yeah more comments would be good.

mattklein123 · 2020-12-08T18:06:45Z

source/common/http/mixed_conn_pool.cc

+  // If an old TLS stack does not negotiate alpn, it likely does not support
+  // HTTP/2. Fail over to HTTP/1.
+  protocol_ = Protocol::Http11;
+  auto tcp_client = static_cast<Tcp::ActiveTcpClient*>(&client);


Should there by a dynamic_cast ASSERT here? Is there any way to avoid the effective dynamic_cast? Seems like this could be an interface function of some type?

source/common/http/mixed_conn_pool.cc

source/common/upstream/cluster_manager_impl.cc

mattklein123 · 2020-12-08T18:10:53Z

source/common/upstream/upstream_impl.cc

+      !raw_factory_pointer->supportsAlpn()) {
+    throw EnvoyException(
+        fmt::format("ALPN configured for a cluster which has a non-ALPN transport socket: {}",
+                    cluster.DebugString()));


nit: I would probably print the cluster name here vs. debug string but up to you.

Signed-off-by: Alyssa Wilk <[email protected]>

alyssawilk · 2020-12-15T13:38:33Z

PTAL?

mattklein123

Thanks very cool. Some random comments but generally LGTM.

/wait

api/envoy/extensions/upstreams/http/v3/http_protocol_options.proto

include/envoy/upstream/upstream.h

source/common/conn_pool/conn_pool_base.cc

mattklein123 · 2020-12-15T17:44:45Z

source/common/http/mixed_conn_pool.h

+  Http::Protocol protocol() { return protocol_; }
+
+private:
+  bool connected_{};


Is this used? Do you maybe mean to guard re-checking protocol if we previously had a connection? If so should this be saw_first_connection_ or something like that?

mattklein123 · 2020-12-15T17:46:19Z

source/common/http/mixed_conn_pool.cc

+  // If an old TLS stack does not negotiate alpn, it likely does not support
+  // HTTP/2. Fail over to HTTP/1.
+  protocol_ = Protocol::Http11;
+  auto tcp_client = dynamic_cast<Tcp::ActiveTcpClient*>(&client);


For perf do you want to static cast this and then assert the dynamic cast? Also, I think I asked this before, but is it possible to not have casts here by just having an interface function that returns nextProtocol()?

I don't think so - we still need to move the connection to the new class. We could have an interface function for "detach and return the network::connection" if you prefer that to the cast?

We could have an interface function for "detach and return the network::connection" if you prefer that to the cast?

Personally yes but up to you.

Bah, checked this branch back out to start on that, but of course the HTTP and HTTP2 clients can't have a function that rips off the network::connection because it's owned by the codec. given it was optional I'm going to leave as is since I think the cast is less ugly than a "hand off if possible but mostly it isn't"

mattklein123 · 2020-12-15T17:49:33Z

source/common/upstream/cluster_manager_impl.cc

+  if (protocols.size() == 2 &&
+      ((protocols[0] == Http::Protocol::Http2 && protocols[1] == Http::Protocol::Http11) ||
+       (protocols[1] == Http::Protocol::Http2 && protocols[0] == Http::Protocol::Http11))) {


This seems kind of fragile and also won't obviously support H3. Can we at least have a TODO about H3 here. I wonder also is there some way of pre-selecting the conn pool somewhere else and then passing an enum in here? I'm not sure what is possible.

This is limited by design. I don't think we know if when we add HTTP/3 it'll be into the mixed pool - I suspect we'd have a wrapper pool around a mixed pool and a probing-h3 pool but not sure yet. Whoever adds that (which may yet be me) will surely be adding code and tests, so can add the relevant extra code then, no?

OK that's fine maybe add a TODO but up to you.

mattklein123 · 2020-12-15T17:53:02Z

source/common/http/mixed_conn_pool.cc

+  // onConnected is called under the stack of the Network::Connection raising
+  // the Connected event. The first time it is called, it's called for a TCP
+  // client, the TCP client is detached from the connection and discarded, and an
+  // HTTP client is associated with that connection. When the first call returns, the
+  // Network::Connection will inform the new callback (the HTTP client) that it
+  // is connected. The early return is to ignore that second call.


OK this makes sense to me now. My only concern is this assumes that adding/removing callbacks is safe in the context of callbacks being called. This is true with the current implementation inside ConnectionImpl, but do you know if we have explicit unit tests there about that? I'm mainly concerned about this being somewhat fragile and we might break it in a non-obvious way. If you think we have good coverage that's fine, but maybe update this comment to mention the implementation assumption?

I don't think it's fragile given it's pretty deterministic and regression tested. The only bit that could change is if someone fixed the extra callback under the hood and that'd be a no-op (just make the comment obsolete). Added some unit tests of unregister/reregister mid-callback as that's totally a good thing to test.

Signed-off-by: Alyssa Wilk <[email protected]>

mattklein123

LGTM modulo remaining optional comments.

Signed-off-by: Alyssa Wilk <[email protected]>

* master: (49 commits) sds: allow multiple init managers share sds target (envoyproxy#14357) [http] Remove legacy codecs (envoyproxy#14381) http2: Add integration tests for METADATA and RST_STREAM frame flood mitigation for upstream servers (envoyproxy#14365) test: start dissolving :printers_include rule. (envoyproxy#14429) integration tests: re-enable set_node_on_first_message_only (envoyproxy#14270) formatter: add a formatter that returns a google::protobuf::Struct rather than a string (envoyproxy#14258) ratelimit: support returning custom response bodies for non-OK responses from the external ratelimit service (envoyproxy#14189) deps: update protobuf to 3.14 (envoyproxy#14253) stream_info: add setResponseCode and update local_reply to take a normal StreamInfo (envoyproxy#14402) http: alpn upstream (envoyproxy#13922) Moved starttls integration test to test/extensions/transport_sockets/starttls. (envoyproxy#14425) generic conn pool: directly use thread local cluster (envoyproxy#14423) wasm: add mathetake to CODEOWNERS (envoyproxy#14427) wasm: clear route cache when modifying HTTP request headers. (envoyproxy#14318) tls: disable TLS inspector injection (envoyproxy#14404) aggregate cluster: cleanups (envoyproxy#14411) Mark starttls_integration_test flaky on Windows (envoyproxy#14419) tcp: improved unit testing (envoyproxy#14415) config: making protocol config explicit (envoyproxy#14362) wasm: dead code (envoyproxy#14407) ... Signed-off-by: Michael Puncel <[email protected]>

http: alpn upstream

3be8bf6

Signed-off-by: Alyssa Wilk <[email protected]>

repokitteh-read-only bot added v2-freeze api labels Nov 5, 2020

mattklein123 self-assigned this Nov 6, 2020

alyssawilk removed the v2-freeze label Nov 9, 2020

unit test fix ups

dca2fe2

Signed-off-by: Alyssa Wilk <[email protected]>

alyssawilk marked this pull request as ready for review November 9, 2020 15:10

alyssawilk added 2 commits November 9, 2020 13:44

unit tests for new code

5accf51

Signed-off-by: Alyssa Wilk <[email protected]>

test fixups

fe7940d

Signed-off-by: Alyssa Wilk <[email protected]>

alyssawilk requested review from ggreenway and mattklein123 as code owners November 10, 2020 17:44

alyssawilk mentioned this pull request Nov 10, 2020

http: removing the http1 and http2 connection pools #13967

Merged

tidy

8d770c9

Signed-off-by: Alyssa Wilk <[email protected]>

yanavlasov self-assigned this Nov 10, 2020

mattklein123 reviewed Nov 11, 2020

View reviewed changes

api/envoy/config/cluster/v3/cluster.proto Outdated Show resolved Hide resolved

repokitteh-read-only bot added waiting:any and removed waiting:any labels Nov 11, 2020

alyssawilk commented Nov 11, 2020

View reviewed changes

mattklein123 added the waiting label Nov 12, 2020

API pass

02099c2

Signed-off-by: Alyssa Wilk <[email protected]>

repokitteh-read-only bot removed the waiting label Nov 12, 2020

mattklein123 requested changes Nov 12, 2020

View reviewed changes

repokitteh-read-only bot added waiting and removed waiting labels Nov 12, 2020

typo fix

f62074a

Signed-off-by: Alyssa Wilk <[email protected]>

repokitteh-read-only bot removed the waiting label Dec 7, 2020

Merge branch 'master' into h1_h2

21833c4

Signed-off-by: Alyssa Wilk <[email protected]>

mattklein123 requested changes Dec 8, 2020

View reviewed changes

repokitteh-read-only bot added the waiting label Dec 8, 2020

comments

3c60453

Signed-off-by: Alyssa Wilk <[email protected]>

repokitteh-read-only bot removed the waiting label Dec 8, 2020

mattklein123 added the waiting label Dec 9, 2020

comment

052756e

Signed-off-by: Alyssa Wilk <[email protected]>

repokitteh-read-only bot removed the waiting label Dec 9, 2020

Merge branch 'master' into h1_h2

a4638a1

Signed-off-by: Alyssa Wilk <[email protected]>

mattklein123 requested changes Dec 15, 2020

View reviewed changes

repokitteh-read-only bot added the waiting label Dec 15, 2020

comments

212268a

Signed-off-by: Alyssa Wilk <[email protected]>

repokitteh-read-only bot removed the waiting label Dec 15, 2020

mattklein123 previously approved these changes Dec 15, 2020

View reviewed changes

repokitteh-read-only bot removed the api label Dec 15, 2020

configuation

c58cdea

Signed-off-by: Alyssa Wilk <[email protected]>

alyssawilk dismissed mattklein123’s stale review via c58cdea December 15, 2020 20:13

repokitteh-read-only bot added the api label Dec 15, 2020

mattklein123 approved these changes Dec 15, 2020

View reviewed changes

repokitteh-read-only bot removed the api label Dec 15, 2020

mattklein123 merged commit 93ee668 into envoyproxy:master Dec 15, 2020

ramaraochavali mentioned this pull request Dec 22, 2020

Set standard alpns as well in outbound traffic istio/istio#29529

Merged

roelfdutoit mentioned this pull request Apr 13, 2021

HttpConnPoolImplMixed could result in many unused upstream connections #15947

Closed

alyssawilk deleted the h1_h2 branch June 10, 2021 13:43

noah8713 mentioned this pull request Nov 30, 2021

envoyfilter: http1.1 case preserve; using auto_config HttpProtocolOptions causes xds push failures; gw CDS stale istio/istio#36299

Closed

oulman mentioned this pull request Sep 9, 2022

Configure Envoy alpn_protocols based on service protocol hashicorp/consul#14356

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

http: alpn upstream #13922

http: alpn upstream #13922

alyssawilk commented Nov 5, 2020

repokitteh-read-only bot commented Nov 5, 2020

alyssawilk commented Nov 5, 2020

alyssawilk commented Nov 9, 2020

mattklein123 left a comment

alyssawilk Nov 11, 2020

mattklein123 Nov 11, 2020

mattklein123 left a comment

mattklein123 Nov 12, 2020

alyssawilk commented Nov 12, 2020

alyssawilk commented Dec 8, 2020

mattklein123 left a comment

mattklein123 Dec 8, 2020

mattklein123 Dec 8, 2020

mattklein123 Dec 8, 2020

mattklein123 Dec 8, 2020

alyssawilk Dec 8, 2020

mattklein123 Dec 9, 2020

mattklein123 Dec 8, 2020

mattklein123 Dec 8, 2020

alyssawilk commented Dec 15, 2020

mattklein123 left a comment

mattklein123 Dec 15, 2020

mattklein123 Dec 15, 2020

alyssawilk Dec 15, 2020

mattklein123 Dec 15, 2020

alyssawilk Dec 15, 2020

mattklein123 Dec 15, 2020

alyssawilk Dec 15, 2020

mattklein123 Dec 15, 2020

mattklein123 Dec 15, 2020

alyssawilk Dec 15, 2020

mattklein123 left a comment


		config.core.v3.UpstreamHttpProtocolOptions upstream_http_protocol_options = 2;

		oneof upstream_protocol_options {

	// If the upstream does not support ALPN, AutoHttpConfig will will fail over to HTTP/1.
	// If the upstream does not support ALPN, AutoHttpConfig will fail over to HTTP/1.

		// When we upgrade from a TCP client to non-TCP we get a spurious onConnected
		// from the new client. Ignore that.

http: alpn upstream #13922

http: alpn upstream #13922

Conversation

alyssawilk commented Nov 5, 2020

repokitteh-read-only bot commented Nov 5, 2020

alyssawilk commented Nov 5, 2020

alyssawilk commented Nov 9, 2020

mattklein123 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mattklein123 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alyssawilk commented Nov 12, 2020

alyssawilk commented Dec 8, 2020

mattklein123 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alyssawilk commented Dec 15, 2020

mattklein123 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mattklein123 left a comment

Choose a reason for hiding this comment