Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

router: adding CONNECT support to upstreams (not prod-ready) #10623

Merged
merged 26 commits into from
Apr 28, 2020

Conversation

alyssawilk
Copy link
Contributor

@alyssawilk alyssawilk commented Apr 2, 2020

CONNECT won't be very useful without proxy protocol support, and needs some pausing for security reasons, both coming soon.

Risk Level: Medium (router refactor)
Testing: unit tests and integration tests
Docs Changes: some
Release Notes: n/a (still hidden)
Part of #1630 #1451

@repokitteh-read-only
Copy link

CC @envoyproxy/api-shepherds: Your approval is needed for changes made to api/.

🐱

Caused by: #10623 was opened by alyssawilk.

see: more, trace.

@PiotrSikora
Copy link
Contributor

cc @lambdai

Signed-off-by: Alyssa Wilk <[email protected]>
@alyssawilk
Copy link
Contributor Author

@snowp or @ggreenway would one of you be up for doing first pass on this? Kind of crosses the line between TCP and upstream

Copy link
Contributor

@snowp snowp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't looked at the tests yet, just some initial comments/qs on the impl

// [#not-implemented-hide:]
// Configuration for sending data upstream as a raw data payload. This is used for
// CONNECT requests, when translating HTTP body data to TCP payload.
message ProxyingConfig {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ProxyConfig maybe? or maybe HttpConnectConfig if this is specific to CONNECT?

api/envoy/api/v2/route/route_components.proto Outdated Show resolved Hide resolved
* @return configuration for TCP proxying, if present.
*/
using ProxyingConfig = envoy::config::route::v3::VirtualHost::ProxyingConfig;
virtual const absl::optional<ProxyingConfig> proxyingConfig() const PURE;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return by const ref?

* @return configuration for TCP proxying, if present.
*/
using ProxyingConfig = envoy::config::route::v3::VirtualHost::ProxyingConfig;
virtual const absl::optional<ProxyingConfig> proxyingConfig() const PURE;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

source/common/router/router.cc Outdated Show resolved Hide resolved
source/common/router/router.cc Outdated Show resolved Hide resolved
source/common/router/upstream_request.cc Show resolved Hide resolved
}

void TcpUpstream::onUpstreamData(Buffer::Instance& data, bool end_stream) {
if (!upstream_request_) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this ever happen? do we end up just dropping data in this case?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah seems odd that we would ever get into this state?

@lizan lizan requested a review from lambdai April 3, 2020 09:19
@alyssawilk
Copy link
Contributor Author

Also when you get around to tests, note I could only unit test this one - integration tests are over in master...alyssawilk:shinkansen_next

Enabling CONNECT has enough security impact and weird HTTP corner cases I wanted to split the upstream and downstream PRs out, and we can't integration test until both halves land

Signed-off-by: Alyssa Wilk <[email protected]>
Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super awesome to see this getting close. This has been a very long requested feature. I have some API/config comments to get started with. Thank you!

/wait

api/envoy/api/v2/route/route_components.proto Outdated Show resolved Hide resolved
source/common/router/router.cc Outdated Show resolved Hide resolved
@htuch
Copy link
Member

htuch commented Apr 7, 2020

Please merge master to pick up #10672. We no longer accept changes to v2 (without explicit exception), so any API modifications should happen in v3. If this PR is adding a new proto, please follow the updated instructions in https://github.com/envoyproxy/envoy/blob/master/api/STYLE.md#adding-an-extension-configuration-to-the-api.

Signed-off-by: Alyssa Wilk <[email protected]>
@alyssawilk
Copy link
Contributor Author

So naively, I'd assume I could pull master, revert my changes to api/envoy/api/v2/route/route_components.proto / generated_api_shadow/envoy/api/v2/route/route_components.proto
run ./tools/proto_format/proto_format.sh fix
and things would work.

In practice I get

multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/usr/lib/python3.7/multiprocessing/pool.py", line 121, in worker
result = (True, func(*args, **kwds))
File "/usr/lib/python3.7/multiprocessing/pool.py", line 44, in mapstar
return list(map(*args))
File "./tools/proto_format/proto_sync.py", line 175, in SyncProtoFile
MergeActiveShadow(active_src, shadow_srcs[0], f.name)
File "./tools/proto_format/proto_sync.py", line 141, in MergeActiveShadow
dst,
File "/usr/lib/python3.7/subprocess.py", line 411, in check_output
**kwargs).stdout
File "/usr/lib/python3.7/subprocess.py", line 512, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['bazel-bin/tools/protoxform/merge_active_shadow', 'bazel-bin/external/envoy_api_canonical/envoy/config/route/v3/pkg/envoy/config/route/v3/route_components.proto.active_or_frozen.proto', 'bazel-bin/external/envoy_api_canonical/envoy/api/v2/route/pkg/envoy/api/v2/route/route_components.proto.next_major_version_candidate.envoy_internal.proto', '/tmp/tmpycuwl_o4']' returned non-zero exit status 1.
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "./tools/proto_format/proto_sync.py", line 402, in
Sync(args.api_shadow_root, args.mode, args.labels, True)
File "./tools/proto_format/proto_sync.py", line 344, in Sync
pkg_deps = p.map(SyncProtoFile, dst_src_paths.items())
File "/usr/lib/python3.7/multiprocessing/pool.py", line 268, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/usr/lib/python3.7/multiprocessing/pool.py", line 657, in get
raise self._value
subprocess.CalledProcessError: Command '['bazel-bin/tools/protoxform/merge_active_shadow', 'bazel-bin/external/envoy_api_canonical/envoy/config/route/v3/pkg/envoy/config/route/v3/route_components.proto.active_or_frozen.proto', 'bazel-bin/external/envoy_api_canonical/envoy/api/v2/route/pkg/envoy/api/v2/route/route_components.proto.next_major_version_candidate.envoy_internal.proto', '/tmp/tmpycuwl_o4']' returned non-zero exit status 1.

If I just leave things as is, it builds happily, with the changes to v2 intact.
Suggestions?

@alyssawilk
Copy link
Contributor Author

Sorry, having looked at this, do we want something more than the existing option of using a header matcher with
{
"name": ":method",
"exact_match": "CONNECT"
}
As you say CONNECT is a special case, but given we already have this functionality I'd be inclined to call it out in the docs, rather than add a one off. Thoughts?

@alyssawilk
Copy link
Contributor Author

@htuch if I apply the diffs to api/envoy/config/route/v3/route_components.proto and run fix_proto from clean master, and bazel test //test/common/router:all the build seg-faults.

Have the new tools for freezing been tested for field additions? Is there some state I should be removing? bazel clean --expunge doesn't do it and I'm not sure what else to do from a clean branch.

@@ -34,7 +34,7 @@ option (udpa.annotations.file_status).package_version_status = FROZEN;
// host header. This allows a single listener to service multiple top level domain path trees. Once
// a virtual host is selected based on the domain, the routes are processed in order to see which
// upstream cluster to route to or whether to perform a redirect.
// [#next-free-field: 21]
// [#next-free-field: 23]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No more changes allowed to v2, it's frozen :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that.
I reverted them and my build seg-faulted. I can't get it to build without the changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pastebin of

  1. copying v3 changes into a clean build
  2. running fix_proto
  3. showing the expected files are changed
  4. building tests fails

https://pastebin.com/N8vSXHQM

@htuch
Copy link
Member

htuch commented Apr 13, 2020

@alyssawilk is your master at HEAD on GitHub? I landed some fixes late last week. If not, please push a branch somewhere and I can investigate.

Signed-off-by: Alyssa Wilk <[email protected]>
@alyssawilk
Copy link
Contributor Author

sigh. clang_tidy caught a missing override but enough of the presubmits were fine I think this is ready for another look while CI churns

Signed-off-by: Alyssa Wilk <[email protected]>
Signed-off-by: Alyssa Wilk <[email protected]>
@alyssawilk alyssawilk changed the title router: adding CONNECT support to upstreams (not yet usable) router: adding CONNECT support to upstreams (not prod-ready) Apr 21, 2020
@alyssawilk
Copy link
Contributor Author

Oh yeah, and what with #10720 landing I turned up the integration tests (modernized to the new config, and host:port requirement)

Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks this is epic. Super excited to see all of this land. Just some high level questions but looks great. Very clean and easy to follow.

/wait

void ConnectRouteEntryImpl::rewritePathHeader(Http::RequestHeaderMap& headers,
bool insert_envoy_original_path) const {
const absl::string_view path =
Http::PathUtil::removeQueryAndFragment(headers.Path()->value().getStringView());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to land the other PR for this to be safe, right? As an aside, any thoughts on making it programatically impossible to mess this up? Now that we split the header map APIs, should we special case Path so it returns an optional or something like that? WDYT? cc @asraa

It might also be good to split the config_impl changes/tests out from the rest of this PR, but up to you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't, because the HCM still adds the synthetic "/" path header.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding impossible to mess up (which I could do with a follow-up after this and before removing the synthetic path header, I think if we wanted to do that we might want to do it with a bunch of the "expected but not guaranteed" headers. My preference would be to have a filter fuzzer / filter compliance suite where every filter is hit with pathless requests, but also other common unusual scenarios like hostless responses.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this would be great, agreed. I think we are going to have a very common class of bugs where filters thing that certain headers will always be set which we are now breaking.

source/common/router/router.cc Outdated Show resolved Hide resolved
}

void TcpUpstream::onUpstreamData(Buffer::Instance& data, bool end_stream) {
if (!upstream_request_) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah seems odd that we would ever get into this state?

.. not have a path, and can only be matched using a
.. :ref:`connect_matcher <envoy_api_field_route.RouteMatch.connect_matcher>`
..
.. Envoy can handle CONNECT in one of two ways, either proxying the CONNECT headers through as if they
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my own understanding, if we proxy CONNECT through, this is really no different from WebSocket, is that right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, in that case it's just a blind upgrade where we proxy payload. Want me to comment something to that effect?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From reading this it sounds to me like if we proxy CONNECT we don't do anything special and just treat it like any other request? I think it could be clearer about what happens after the CONNECT is proxied/established.

I think it might also be useful to explain whether we support terminating the CONNECT and dropping down to L4 proxying. I don't think we do (?) but I think it's a use case people might be interested in and it might be worth calling out what our support is there if we don't already.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think some clarifications would be helpful to explain the differences between WS/CONNECT and termination vs. pass through. With this change terminating WS would be pretty trivial so that could be a future enhancement for someone (might be worth tracking in an issue).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I may have been steeped in the code too long but I'm not sure how I could make this more clear, but I think it's clear from Snow's comment that it's not as clear as I want it to be.

For connect we can basically treat it like normal L7 request, forwarding request headers and payload through, or we can terminate, strip connect headers, and forward the CONNECT payload on a raw TCP connection. I don't know how to explain that more clearly than I am doing so - I'll try for a little rewording in my next push but I'd totally appreciate any help making it clear what the options are.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine and we can ship and iterate.

@@ -259,5 +294,29 @@ class HttpUpstream : public GenericUpstream, public Http::StreamCallbacks {
Http::RequestEncoder* request_encoder_{};
};

class TcpUpstream : public GenericUpstream, public Tcp::ConnectionPool::UpstreamCallbacks {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which timeouts come into play with a CONNECT/WS session? I guess the downstream idle timeouts? Anything else? I think we will need to make sure that #10531 works with non-pass through CONNECT also? WDYT? cc @Shikugawa

I haven't reviewed the tests yet but it would be great to make sure we have good integration coverage over all of the various timeout scenarios.

Copy link
Contributor Author

@alyssawilk alyssawilk Apr 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All HTTP timeouts may apply - we're 100% reusing the classes which do timeout logic.
Presumably one would want larger (or no) overall timeouts on WS / CONNECT routes.
Given the code reuse I did one timeout test to make sure that disconnects worked. I can do the full suite if you'd prefer?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you have to do them all I just wanted to make sure we are covered. @Shikugawa can make sure to cover CONNECT when the linked PR is finished for idle timeouts.


if (http_pool) {
host = http_pool->host();
if (!should_tcp_proxy) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need any stats for the CONNECT cases, both termination and pass through? Any other missing stats? Fine to do this stuff in a follow up but might be worth a TODO for this type of stuff. I assume also that all of the circuit breakers are built into the TCP connection pool so we are good there?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've got general upgrade stats, but I didn't split out CONNECT or differentiate connect upgrade from connect termination. Have thoughts on what folks would use and I can add them in a follow up?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ultimately I think having some stats specific to CONNECT/WS might be useful, but I agree we can see what people want later.

Signed-off-by: Alyssa Wilk <[email protected]>
snowp
snowp previously requested changes Apr 22, 2020
source/common/router/router.cc Outdated Show resolved Hide resolved
.. CONNECT support
.. ^^^^^^^^^^^^^^^

.. Envoy CONNECT support is off by default (Envoy will send an internall generated 403 in response to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: internall -> internally

.. not have a path, and can only be matched using a
.. :ref:`connect_matcher <envoy_api_field_route.RouteMatch.connect_matcher>`
..
.. Envoy can handle CONNECT in one of two ways, either proxying the CONNECT headers through as if they
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From reading this it sounds to me like if we proxy CONNECT we don't do anything special and just treat it like any other request? I think it could be clearer about what happens after the CONNECT is proxied/established.

I think it might also be useful to explain whether we support terminating the CONNECT and dropping down to L4 proxying. I don't think we do (?) but I think it's a use case people might be interested in and it might be worth calling out what our support is there if we don't already.

Signed-off-by: Alyssa Wilk <[email protected]>
Copy link
Contributor Author

@alyssawilk alyssawilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, merged in the connect matcher, which makes this PR much simpler.
Still outstanding but TODO in follow-ups

  • figure out what to do with pathless-connect for safety (current plan is improved fuzzing with Asra is on) and removing the synthetic path
  • proxy proto (written and tested - will mail once this lands)
  • pause-before sending upgrade (written and testing - ditto)

I think this PR has addressed all outstanding comments modulo the docs maybe not being clear enough on the two options (proxying vs terminating/L4-downgrade). If folks are amenable I'd love to land this and iterate on docs since they're hidden but we can do them in place too.

Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, epic. LGTM to ship and iterate. Feel free to action on my comments in follow ups if you want.

.. not have a path, and can only be matched using a
.. :ref:`connect_matcher <envoy_api_field_route.RouteMatch.connect_matcher>`
..
.. Envoy can handle CONNECT in one of two ways, either proxying the CONNECT headers through as if they
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine and we can ship and iterate.

@@ -655,6 +662,38 @@ Http::FilterHeadersStatus Filter::decodeHeaders(Http::RequestHeaderMap& headers,
return Http::FilterHeadersStatus::StopIteration;
}

Filter::HttpOrTcpPool Filter::createConnPool(Upstream::HostDescriptionConstSharedPtr& host) {
Filter::HttpOrTcpPool conn_pool;
bool should_tcp_proxy = route_entry_->connectConfig().has_value() &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: const

}
return false;
}
absl::optional<Http::Protocol> protocol() const override { return absl::nullopt; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this used for logging? Should we populate this? I guess there is the question of upstream access logging for terminated connect requests? Maybe a TODO for this? cc @kyessenov

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, it goes in the streaminfo. IMO if we're doing raw L4 proxying upstream the HTTP based protocol should not be set, since we're not doing HTTP/1 or HTTP/2. Thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend briefly chatting with @kyessenov about this. He has been trying to normalize all of this so we support other protocols including TCP, etc. so I think we should probably have some indication in the logs of what is happening here. Definitely fine to TODO/issue after you determine the best future course of action.

@alyssawilk alyssawilk dismissed snowp’s stale review April 28, 2020 13:39

addressed I think

@alyssawilk
Copy link
Contributor Author

Sweet, will address comments in the next one :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants