-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
router: adding CONNECT support to upstreams (not prod-ready) #10623
Conversation
cc @lambdai |
Signed-off-by: Alyssa Wilk <[email protected]>
Signed-off-by: Alyssa Wilk <[email protected]>
d6d253f
to
7f134d1
Compare
Signed-off-by: Alyssa Wilk <[email protected]>
@snowp or @ggreenway would one of you be up for doing first pass on this? Kind of crosses the line between TCP and upstream |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't looked at the tests yet, just some initial comments/qs on the impl
// [#not-implemented-hide:] | ||
// Configuration for sending data upstream as a raw data payload. This is used for | ||
// CONNECT requests, when translating HTTP body data to TCP payload. | ||
message ProxyingConfig { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ProxyConfig
maybe? or maybe HttpConnectConfig
if this is specific to CONNECT?
include/envoy/router/router.h
Outdated
* @return configuration for TCP proxying, if present. | ||
*/ | ||
using ProxyingConfig = envoy::config::route::v3::VirtualHost::ProxyingConfig; | ||
virtual const absl::optional<ProxyingConfig> proxyingConfig() const PURE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return by const ref?
include/envoy/router/router.h
Outdated
* @return configuration for TCP proxying, if present. | ||
*/ | ||
using ProxyingConfig = envoy::config::route::v3::VirtualHost::ProxyingConfig; | ||
virtual const absl::optional<ProxyingConfig> proxyingConfig() const PURE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
} | ||
|
||
void TcpUpstream::onUpstreamData(Buffer::Instance& data, bool end_stream) { | ||
if (!upstream_request_) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this ever happen? do we end up just dropping data in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah seems odd that we would ever get into this state?
Also when you get around to tests, note I could only unit test this one - integration tests are over in master...alyssawilk:shinkansen_next Enabling CONNECT has enough security impact and weird HTTP corner cases I wanted to split the upstream and downstream PRs out, and we can't integration test until both halves land |
Signed-off-by: Alyssa Wilk <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super awesome to see this getting close. This has been a very long requested feature. I have some API/config comments to get started with. Thank you!
/wait
Please merge master to pick up #10672. We no longer accept changes to v2 (without explicit exception), so any API modifications should happen in v3. If this PR is adding a new proto, please follow the updated instructions in https://github.com/envoyproxy/envoy/blob/master/api/STYLE.md#adding-an-extension-configuration-to-the-api. |
Signed-off-by: Alyssa Wilk <[email protected]>
ff118b5
to
21986a9
Compare
Signed-off-by: Alyssa Wilk <[email protected]>
So naively, I'd assume I could pull master, revert my changes to api/envoy/api/v2/route/route_components.proto / generated_api_shadow/envoy/api/v2/route/route_components.proto In practice I get multiprocessing.pool.RemoteTraceback: The above exception was the direct cause of the following exception: Traceback (most recent call last): If I just leave things as is, it builds happily, with the changes to v2 intact. |
Signed-off-by: Alyssa Wilk <[email protected]>
Sorry, having looked at this, do we want something more than the existing option of using a header matcher with |
Signed-off-by: Alyssa Wilk <[email protected]>
@htuch if I apply the diffs to api/envoy/config/route/v3/route_components.proto and run fix_proto from clean master, and bazel test //test/common/router:all the build seg-faults. Have the new tools for freezing been tested for field additions? Is there some state I should be removing? bazel clean --expunge doesn't do it and I'm not sure what else to do from a clean branch. |
@@ -34,7 +34,7 @@ option (udpa.annotations.file_status).package_version_status = FROZEN; | |||
// host header. This allows a single listener to service multiple top level domain path trees. Once | |||
// a virtual host is selected based on the domain, the routes are processed in order to see which | |||
// upstream cluster to route to or whether to perform a redirect. | |||
// [#next-free-field: 21] | |||
// [#next-free-field: 23] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No more changes allowed to v2, it's frozen :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that.
I reverted them and my build seg-faulted. I can't get it to build without the changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pastebin of
- copying v3 changes into a clean build
- running fix_proto
- showing the expected files are changed
- building tests fails
@alyssawilk is your master at HEAD on GitHub? I landed some fixes late last week. If not, please push a branch somewhere and I can investigate. |
Signed-off-by: Alyssa Wilk <[email protected]>
sigh. clang_tidy caught a missing override but enough of the presubmits were fine I think this is ready for another look while CI churns |
Signed-off-by: Alyssa Wilk <[email protected]>
Signed-off-by: Alyssa Wilk <[email protected]>
Signed-off-by: Alyssa Wilk <[email protected]>
Oh yeah, and what with #10720 landing I turned up the integration tests (modernized to the new config, and host:port requirement) |
Signed-off-by: Alyssa Wilk <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks this is epic. Super excited to see all of this land. Just some high level questions but looks great. Very clean and easy to follow.
/wait
source/common/router/config_impl.cc
Outdated
void ConnectRouteEntryImpl::rewritePathHeader(Http::RequestHeaderMap& headers, | ||
bool insert_envoy_original_path) const { | ||
const absl::string_view path = | ||
Http::PathUtil::removeQueryAndFragment(headers.Path()->value().getStringView()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to land the other PR for this to be safe, right? As an aside, any thoughts on making it programatically impossible to mess this up? Now that we split the header map APIs, should we special case Path so it returns an optional or something like that? WDYT? cc @asraa
It might also be good to split the config_impl changes/tests out from the rest of this PR, but up to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't, because the HCM still adds the synthetic "/" path header.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding impossible to mess up (which I could do with a follow-up after this and before removing the synthetic path header, I think if we wanted to do that we might want to do it with a bunch of the "expected but not guaranteed" headers. My preference would be to have a filter fuzzer / filter compliance suite where every filter is hit with pathless requests, but also other common unusual scenarios like hostless responses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this would be great, agreed. I think we are going to have a very common class of bugs where filters thing that certain headers will always be set which we are now breaking.
} | ||
|
||
void TcpUpstream::onUpstreamData(Buffer::Instance& data, bool end_stream) { | ||
if (!upstream_request_) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah seems odd that we would ever get into this state?
.. not have a path, and can only be matched using a | ||
.. :ref:`connect_matcher <envoy_api_field_route.RouteMatch.connect_matcher>` | ||
.. | ||
.. Envoy can handle CONNECT in one of two ways, either proxying the CONNECT headers through as if they |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For my own understanding, if we proxy CONNECT through, this is really no different from WebSocket, is that right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, in that case it's just a blind upgrade where we proxy payload. Want me to comment something to that effect?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From reading this it sounds to me like if we proxy CONNECT we don't do anything special and just treat it like any other request? I think it could be clearer about what happens after the CONNECT is proxied/established.
I think it might also be useful to explain whether we support terminating the CONNECT and dropping down to L4 proxying. I don't think we do (?) but I think it's a use case people might be interested in and it might be worth calling out what our support is there if we don't already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think some clarifications would be helpful to explain the differences between WS/CONNECT and termination vs. pass through. With this change terminating WS would be pretty trivial so that could be a future enhancement for someone (might be worth tracking in an issue).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I may have been steeped in the code too long but I'm not sure how I could make this more clear, but I think it's clear from Snow's comment that it's not as clear as I want it to be.
For connect we can basically treat it like normal L7 request, forwarding request headers and payload through, or we can terminate, strip connect headers, and forward the CONNECT payload on a raw TCP connection. I don't know how to explain that more clearly than I am doing so - I'll try for a little rewording in my next push but I'd totally appreciate any help making it clear what the options are.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is fine and we can ship and iterate.
@@ -259,5 +294,29 @@ class HttpUpstream : public GenericUpstream, public Http::StreamCallbacks { | |||
Http::RequestEncoder* request_encoder_{}; | |||
}; | |||
|
|||
class TcpUpstream : public GenericUpstream, public Tcp::ConnectionPool::UpstreamCallbacks { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which timeouts come into play with a CONNECT/WS session? I guess the downstream idle timeouts? Anything else? I think we will need to make sure that #10531 works with non-pass through CONNECT also? WDYT? cc @Shikugawa
I haven't reviewed the tests yet but it would be great to make sure we have good integration coverage over all of the various timeout scenarios.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All HTTP timeouts may apply - we're 100% reusing the classes which do timeout logic.
Presumably one would want larger (or no) overall timeouts on WS / CONNECT routes.
Given the code reuse I did one timeout test to make sure that disconnects worked. I can do the full suite if you'd prefer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you have to do them all I just wanted to make sure we are covered. @Shikugawa can make sure to cover CONNECT when the linked PR is finished for idle timeouts.
source/common/router/router.cc
Outdated
|
||
if (http_pool) { | ||
host = http_pool->host(); | ||
if (!should_tcp_proxy) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need any stats for the CONNECT cases, both termination and pass through? Any other missing stats? Fine to do this stuff in a follow up but might be worth a TODO for this type of stuff. I assume also that all of the circuit breakers are built into the TCP connection pool so we are good there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've got general upgrade stats, but I didn't split out CONNECT or differentiate connect upgrade from connect termination. Have thoughts on what folks would use and I can add them in a follow up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ultimately I think having some stats specific to CONNECT/WS might be useful, but I agree we can see what people want later.
Signed-off-by: Alyssa Wilk <[email protected]>
Signed-off-by: Alyssa Wilk <[email protected]>
.. CONNECT support | ||
.. ^^^^^^^^^^^^^^^ | ||
|
||
.. Envoy CONNECT support is off by default (Envoy will send an internall generated 403 in response to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: internall
-> internally
.. not have a path, and can only be matched using a | ||
.. :ref:`connect_matcher <envoy_api_field_route.RouteMatch.connect_matcher>` | ||
.. | ||
.. Envoy can handle CONNECT in one of two ways, either proxying the CONNECT headers through as if they |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From reading this it sounds to me like if we proxy CONNECT we don't do anything special and just treat it like any other request? I think it could be clearer about what happens after the CONNECT is proxied/established.
I think it might also be useful to explain whether we support terminating the CONNECT and dropping down to L4 proxying. I don't think we do (?) but I think it's a use case people might be interested in and it might be worth calling out what our support is there if we don't already.
Signed-off-by: Alyssa Wilk <[email protected]>
Split out from #10623 Signed-off-by: Alyssa Wilk <[email protected]>
Signed-off-by: Alyssa Wilk <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, merged in the connect matcher, which makes this PR much simpler.
Still outstanding but TODO in follow-ups
- figure out what to do with pathless-connect for safety (current plan is improved fuzzing with Asra is on) and removing the synthetic path
- proxy proto (written and tested - will mail once this lands)
- pause-before sending upgrade (written and testing - ditto)
I think this PR has addressed all outstanding comments modulo the docs maybe not being clear enough on the two options (proxying vs terminating/L4-downgrade). If folks are amenable I'd love to land this and iterate on docs since they're hidden but we can do them in place too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, epic. LGTM to ship and iterate. Feel free to action on my comments in follow ups if you want.
.. not have a path, and can only be matched using a | ||
.. :ref:`connect_matcher <envoy_api_field_route.RouteMatch.connect_matcher>` | ||
.. | ||
.. Envoy can handle CONNECT in one of two ways, either proxying the CONNECT headers through as if they |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is fine and we can ship and iterate.
@@ -655,6 +662,38 @@ Http::FilterHeadersStatus Filter::decodeHeaders(Http::RequestHeaderMap& headers, | |||
return Http::FilterHeadersStatus::StopIteration; | |||
} | |||
|
|||
Filter::HttpOrTcpPool Filter::createConnPool(Upstream::HostDescriptionConstSharedPtr& host) { | |||
Filter::HttpOrTcpPool conn_pool; | |||
bool should_tcp_proxy = route_entry_->connectConfig().has_value() && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: const
} | ||
return false; | ||
} | ||
absl::optional<Http::Protocol> protocol() const override { return absl::nullopt; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this used for logging? Should we populate this? I guess there is the question of upstream access logging for terminated connect requests? Maybe a TODO for this? cc @kyessenov
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, it goes in the streaminfo. IMO if we're doing raw L4 proxying upstream the HTTP based protocol should not be set, since we're not doing HTTP/1 or HTTP/2. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would recommend briefly chatting with @kyessenov about this. He has been trying to normalize all of this so we support other protocols including TCP, etc. so I think we should probably have some indication in the logs of what is happening here. Definitely fine to TODO/issue after you determine the best future course of action.
Sweet, will address comments in the next one :-) |
CONNECT won't be very useful without proxy protocol support, and needs some pausing for security reasons, both coming soon.
Risk Level: Medium (router refactor)
Testing: unit tests and integration tests
Docs Changes: some
Release Notes: n/a (still hidden)
Part of #1630 #1451