-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
http: Add ability to merge slashes #7621
Conversation
Signed-off-by: Ruslan Nigmatullin <[email protected]>
Signed-off-by: Ruslan Nigmatullin <[email protected]>
/retest |
🔨 rebuilding |
/assign @jmarantz could you have a look? |
🙀 Error while processing event:
|
Signed-off-by: Ruslan Nigmatullin <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this.
api/envoy/config/filter/network/http_connection_manager/v2/http_connection_manager.proto
Outdated
Show resolved
Hide resolved
} | ||
return true; | ||
if (config.shouldMergeSlashes()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why's the path validity checked on the original value i.e. prior to merging slashes? I'd add a comment explaining that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why bother to merge slashes if is_valid_path is false?
Signed-off-by: Ruslan Nigmatullin <[email protected]>
Signed-off-by: Ruslan Nigmatullin <[email protected]>
Signed-off-by: Ruslan Nigmatullin <[email protected]>
@venilnoronha Please have another look, and let me know if there is any outstanding feedback. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks!
Signed-off-by: Ruslan Nigmatullin <[email protected]>
Can this change be merged, please? |
@euroelessar a maintainer has to review it. Hopefully @jmarantz can take a look soon. |
} | ||
return true; | ||
if (config.shouldMergeSlashes()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why bother to merge slashes if is_valid_path is false?
source/common/http/path_utility.cc
Outdated
@@ -52,5 +52,43 @@ bool PathUtil::canonicalPath(HeaderEntry& path_header) { | |||
return true; | |||
} | |||
|
|||
/* static */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we use this style of commenting that a method is static in Envoy.
source/common/http/path_utility.cc
Outdated
for (size_t i = 1; i < original_path.size(); ++i) { | ||
if (original_path[i] == '/' && original_path[i - 1] == '/') { | ||
has_adjacent_slashes = true; | ||
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
capture start-index here to avoid re-searching the same text.
Better yet, use more absl strings magic, to simplify the code
absl::string_view::size_type query_start = orignal_path.find('?');
absl::string_view query;
if (query_start != absl::string_view::npos) {
query = original_path.substr(query_start);
original_path = original_path.substr(0, query_start);
}
if (original_path.find("//") != absl::string_view::npos) {
path_header.value(absl::StrCat(absl::StrReplace(original_path, {{"//", "/"}}), query));
}
WDYT? I'm not 100% sure about all the corner cases like an input of "///" or "////" and what they should do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately absl::StrReplace
-based approach does not handle triple (or more) slashes, but I've simplified the rest of the code based on your recommendation.
const std::vector<std::pair<std::string, std::string>> non_normal_pairs{ | ||
{"", ""}, // empty | ||
{"/a", "/a"}, // no-op | ||
{"//a/b/c", "/a/b/c"}, // double / start |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
include some triple and quadruple slashes in your test suite.
auto& path_header = pathHeaderEntry(path_pair.first); | ||
PathUtil::mergeSlashes(path_header); | ||
EXPECT_EQ(path_header.value().getStringView(), path_pair.second) | ||
<< "original path: " << path_pair.first; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WDYT of turning the body of the loop into a lambda or test-helper method std::string normalizePath(absl::string_view path)
and then writing:
EXPECT_EQ("/a/b/c", normalizePath("//a/b/c"));
what you have is idiomatic for golang but most C++ tests I see look more like my suggestion. I'm OK with what you have; just an idea.
EXPECT_EQ(path_header.value().getStringView(), path_pair.second) | ||
<< "original path: " << path_pair.first; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
include a test method that also ensures that no slash-merging takes place when the option is off.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's covered by ConnectionManagerUtilityTest::MergeSlashesDefaultOff
test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, ok, that makes sense. I think handling of 3 or more slashes should be covered here, though, as this is the functional unit-test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test for handling of 3 or more slashes was already added in the latest revision
Thanks for doing this, and thanks @venilnoronha for the first review. Just a few nits/cleanups and we should be good to go. |
Signed-off-by: Ruslan Nigmatullin <[email protected]>
source/common/http/path_utility.cc
Outdated
} | ||
if (!has_adjacent_slashes) { | ||
// Only operate on path component in URL. | ||
const size_t query_start = original_path.find('?'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/size_t/absl::string_view::size_type
source/common/http/path_utility.cc
Outdated
// Only operate on path component in URL. | ||
const size_t query_start = original_path.find('?'); | ||
const auto path = original_path.substr(0, query_start); | ||
const auto query = absl::ClippedSubstr(original_path, query_start); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handling the npos case for query_start does not appear to be a documented use of substr etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Point 24.4.2. of C++17 standard specifies that std::base_string_view::substr
lets rlen be the smaller of n and size() - pos
, which covers npos
.
absl::string_view
is API-compatible with std::string_view
so I assume it inherits the spec behavior even if it's not called out in the documentation explicitly.
source/common/http/path_utility.cc
Outdated
if (!has_adjacent_slashes) { | ||
// Only operate on path component in URL. | ||
const size_t query_start = original_path.find('?'); | ||
const auto path = original_path.substr(0, query_start); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use explicit types in this context as type is not obvious from context unless you know string_view well.
source/common/http/path_utility.cc
Outdated
return; | ||
} | ||
|
||
std::string simplified_path; | ||
simplified_path.reserve(original_path.size()); | ||
for (size_t i = 0; i < original_path.size(); ++i) { | ||
if (i > 0 && original_path[i] == '/' && original_path[i - 1] == '/') { | ||
for (size_t i = 0; i < path.size(); ++i) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd still prefer using the absl functions for this to avoid having to look carefully at string-hacking code for common patterns. But absl::Substitute is actually preferred for strings known at compile-time. Just call it in a loop till there are no more occurrences of "//" (change the 'if (path.find" to a while-loop, reversing the polarity of the condition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about this further as my 'while' suggestion would be n^2 on a string with 100 consecutive slashes.
Let's do the initial scan for "//" via if, but then we can just use split/join to remove duplicate slashes. WDYT?
return absl::StrCat(absl::StrJoin(absl::StrSplit(original_path, "/"), absl::SkipEmpty), "/"), query);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case we would also need to either special-case /
prefix or assume that all paths are always absolute.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah also trailing slash, so cases to handle are:
/a//b
/a//b/
a//b
a//b/
It's still probably simpler than the character-at-a-time scan but maybe there's a better absl util to use?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably good to add those cases anyway to the test-suite regardless of the algo we use
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated, as well as added extra tests
<< "original path: " << path_pair.first; | ||
} | ||
auto sanitized_path_value = path_header.value().getStringView(); | ||
return std::string(sanitized_path_value.begin(), sanitized_path_value.end()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return std::string(path_header.value().getStringView());
EXPECT_EQ(path_header.value().getStringView(), path_pair.second) | ||
<< "original path: " << path_pair.first; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, ok, that makes sense. I think handling of 3 or more slashes should be covered here, though, as this is the functional unit-test.
// Determines if adjacent slashes in the path are merged into one before any processing of | ||
// requests by HTTP filters or routing. This affects the upstream *:path* header as well. Without | ||
// setting this option incoming requests with path `//dir/file` will not match against route with | ||
// `prefix` match set to `/dir`. Defaults to `false`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you point to the HTTP spec indicating that multiple slashes should be merged? I suspect such a spec for canonicalized URL paths exists -- same place where it discusses handling of "..".
This should referenced in the comments or maybe even in this doc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HTTP spec does not specify that slashes should be merged, moreover it explicitly states that behavior around slashes is resource-specific (and this is one of the reasons to make it optional, and not enable by default).
However it's useful in real world applications to provide a safe-guard against user errors, when we know for sure that the intended behavior is similar to filesystem one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed it doesn't. It does discuss treatment of ".." which Envoy does, but this is AFAICT not a specified behavior. https://tools.ietf.org/html/rfc3986#section-6
WDYT of making this a filter rather than putting it in the core, since it's really an optional extension? I think it's possible for a filter to adjust the path. I don't feel too strongly either way, but with @mattklein123 we've been talking about building a bare-bones Envoy with only the core functionality and if this were a filter in test/extensions/filters/http/... it would be easier to compile it out. Not that it's that expensive, but it might make sense to keep the core in-spec and as small as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a strong preference here and can follow a maintainers' decision.
One thing to consider is that until #6602 is resolved invalidating routing decision is somehow expensive. (and we have to redo routing on a path change)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO it's fine to have this type of thing be built in as an option. @alyssawilk @PiotrSikora any objection?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My only concern is the slippery slope of too many knobs.
I'm going to claim this passes the two company test, as we elide at least initial multiple slashes ourselves and my guess is we can get away with eliding all but we should either make sure things pass the multi-user test or consider making them pluggable.
Signed-off-by: Ruslan Nigmatullin <[email protected]>
Signed-off-by: Ruslan Nigmatullin <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
basically lgtm modulo the question about whether this should be a filter or core functionality, which I think can be part of the @envoyproxy/senior-maintainers review.
Thanks for all the updates :)
@@ -7,6 +7,7 @@ Version history | |||
* config: async data access for local and remote data source. | |||
* config: changed the default value of :ref:`initial_fetch_timeout <envoy_api_field_core.ConfigSource.initial_fetch_timeout>` from 0s to 15s. This is a change in behaviour in the sense that Envoy will move to the next initialization phase, even if the first config is not delivered in 15s. Refer to :ref:`initialization process <arch_overview_initialization>` for more details. | |||
* http: added the ability to reject HTTP/1.1 requests with invalid HTTP header values, using the runtime feature `envoy.reloadable_features.strict_header_validation`. | |||
* http: added the ability to :ref:`merge adjacent slashes<envoy_api_field_config.filter.network.http_connection_manager.v2.HttpConnectionManager.merge_slashes>` in the path. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's probably worth noting here that this canonicalization is not mentioned in any HTTP spec, but is offered as an opt-in convenience to upstreams.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM on the general approach with 1 comment. Please also merge master to pick up the new CI config. Thank you!
/wait
// requests by HTTP filters or routing. This affects the upstream *:path* header as well. Without | ||
// setting this option incoming requests with path `//dir/file` will not match against route with | ||
// `prefix` match set to `/dir`. Defaults to `false`. | ||
google.protobuf.BoolValue merge_slashes = 33; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be a simple bool
type as it defaults to false.
// Determines if adjacent slashes in the path are merged into one before any processing of | ||
// requests by HTTP filters or routing. This affects the upstream *:path* header as well. Without | ||
// setting this option incoming requests with path `//dir/file` will not match against route with | ||
// `prefix` match set to `/dir`. Defaults to `false`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO it's fine to have this type of thing be built in as an option. @alyssawilk @PiotrSikora any objection?
Signed-off-by: Ruslan Nigmatullin <[email protected]>
* use bool Signed-off-by: Ruslan Nigmatullin <[email protected]>
Signed-off-by: Ruslan Nigmatullin <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks this LGTM but will wait until Monday when @alyssawilk is back in the office to see if she or @PiotrSikora have any comments. Nice work!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! A few extra thoughts from my pass
@@ -52,5 +54,20 @@ bool PathUtil::canonicalPath(HeaderEntry& path_header) { | |||
return true; | |||
} | |||
|
|||
void PathUtil::mergeSlashes(HeaderEntry& path_header) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit twitchy about full qualified URLs here. Granted, Envoy currently handles fully qualified urls in the H1 codec by splitting them up, so if the incoming request is GET http://foo.com//bar we'll have :authority foo.com and :path /bar so this is fine today (we won't end up with http:/foo.com/bar).
It might be worth an assert that the path is relative, just in case someone does some lua or use defined headers trying to take advantage of Path == url in firstline and having their absolute URL messed up.
api/envoy/config/filter/network/http_connection_manager/v2/http_connection_manager.proto
Outdated
Show resolved
Hide resolved
// Determines if adjacent slashes in the path are merged into one before any processing of | ||
// requests by HTTP filters or routing. This affects the upstream *:path* header as well. Without | ||
// setting this option incoming requests with path `//dir/file` will not match against route with | ||
// `prefix` match set to `/dir`. Defaults to `false`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My only concern is the slippery slope of too many knobs.
I'm going to claim this passes the two company test, as we elide at least initial multiple slashes ourselves and my guess is we can get away with eliding all but we should either make sure things pass the multi-user test or consider making them pluggable.
} | ||
const absl::string_view prefix = absl::StartsWith(path, "/") ? "/" : absl::string_view(); | ||
const absl::string_view suffix = absl::EndsWith(path, "/") ? "/" : absl::string_view(); | ||
path_header.value(absl::StrCat( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this is O(n) if we get something whacky like a path which is 16k worth of / yeah?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see anything explicit in the absl documentation, but it's O(n) based on the code.
* update example in docs Signed-off-by: Ruslan Nigmatullin <[email protected]>
16c08c4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
/azp run envoy-macos |
Azure Pipelines successfully started running 1 pipeline(s). |
Description: Add an optional ability to merge slashes in the request's path
Risk Level: LOW (opt-in)
Testing: unit tests
Docs Changes: inline protobuf documentation
Release Notes: updated
Fixes #7611