Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

http: Add ability to merge slashes #7621

Merged
merged 14 commits into from
Jul 30, 2019

Conversation

euroelessar
Copy link
Contributor

Description: Add an optional ability to merge slashes in the request's path
Risk Level: LOW (opt-in)
Testing: unit tests
Docs Changes: inline protobuf documentation
Release Notes: updated
Fixes #7611

Ruslan Nigmatullin added 2 commits July 17, 2019 15:01
Signed-off-by: Ruslan Nigmatullin <[email protected]>
Signed-off-by: Ruslan Nigmatullin <[email protected]>
@zuercher
Copy link
Member

/retest

@repokitteh-read-only
Copy link

🔨 rebuilding ci/circleci: asan (failed build)

🐱

Caused by: a #7621 (comment) was created by @zuercher.

see: more, trace.

@zuercher
Copy link
Member

/assign @jmarantz could you have a look?

@repokitteh-read-only
Copy link

🙀 Error while processing event:

evaluation error
error: function _rk_error error: path contains forbidden characters
🐱

Caused by: a #7621 (comment) was created by @zuercher.

see: more, trace.

Signed-off-by: Ruslan Nigmatullin <[email protected]>
Copy link
Member

@venilnoronha venilnoronha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this.

}
return true;
if (config.shouldMergeSlashes()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why's the path validity checked on the original value i.e. prior to merging slashes? I'd add a comment explaining that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why bother to merge slashes if is_valid_path is false?

source/common/http/path_utility.cc Outdated Show resolved Hide resolved
source/common/http/path_utility.cc Outdated Show resolved Hide resolved
source/common/http/path_utility.h Outdated Show resolved Hide resolved
test/common/http/path_utility_test.cc Outdated Show resolved Hide resolved
Ruslan Nigmatullin added 3 commits July 18, 2019 17:19
Signed-off-by: Ruslan Nigmatullin <[email protected]>
Signed-off-by: Ruslan Nigmatullin <[email protected]>
@euroelessar
Copy link
Contributor Author

@venilnoronha Please have another look, and let me know if there is any outstanding feedback.

venilnoronha
venilnoronha previously approved these changes Jul 23, 2019
Copy link
Member

@venilnoronha venilnoronha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@euroelessar
Copy link
Contributor Author

Can this change be merged, please?

@mattklein123
Copy link
Member

@euroelessar a maintainer has to review it. Hopefully @jmarantz can take a look soon.

}
return true;
if (config.shouldMergeSlashes()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why bother to merge slashes if is_valid_path is false?

@@ -52,5 +52,43 @@ bool PathUtil::canonicalPath(HeaderEntry& path_header) {
return true;
}

/* static */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we use this style of commenting that a method is static in Envoy.

for (size_t i = 1; i < original_path.size(); ++i) {
if (original_path[i] == '/' && original_path[i - 1] == '/') {
has_adjacent_slashes = true;
break;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

capture start-index here to avoid re-searching the same text.

Better yet, use more absl strings magic, to simplify the code

absl::string_view::size_type query_start = orignal_path.find('?');
absl::string_view query;
if (query_start != absl::string_view::npos) {
  query = original_path.substr(query_start);
  original_path = original_path.substr(0, query_start);
}
if (original_path.find("//") != absl::string_view::npos) {
  path_header.value(absl::StrCat(absl::StrReplace(original_path, {{"//", "/"}}), query));
}

WDYT? I'm not 100% sure about all the corner cases like an input of "///" or "////" and what they should do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately absl::StrReplace-based approach does not handle triple (or more) slashes, but I've simplified the rest of the code based on your recommendation.

const std::vector<std::pair<std::string, std::string>> non_normal_pairs{
{"", ""}, // empty
{"/a", "/a"}, // no-op
{"//a/b/c", "/a/b/c"}, // double / start
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

include some triple and quadruple slashes in your test suite.

auto& path_header = pathHeaderEntry(path_pair.first);
PathUtil::mergeSlashes(path_header);
EXPECT_EQ(path_header.value().getStringView(), path_pair.second)
<< "original path: " << path_pair.first;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WDYT of turning the body of the loop into a lambda or test-helper method std::string normalizePath(absl::string_view path) and then writing:

  EXPECT_EQ("/a/b/c", normalizePath("//a/b/c"));

what you have is idiomatic for golang but most C++ tests I see look more like my suggestion. I'm OK with what you have; just an idea.

EXPECT_EQ(path_header.value().getStringView(), path_pair.second)
<< "original path: " << path_pair.first;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

include a test method that also ensures that no slash-merging takes place when the option is off.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's covered by ConnectionManagerUtilityTest::MergeSlashesDefaultOff test

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, ok, that makes sense. I think handling of 3 or more slashes should be covered here, though, as this is the functional unit-test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test for handling of 3 or more slashes was already added in the latest revision

@jmarantz
Copy link
Contributor

Thanks for doing this, and thanks @venilnoronha for the first review. Just a few nits/cleanups and we should be good to go.

cr
Signed-off-by: Ruslan Nigmatullin <[email protected]>
}
if (!has_adjacent_slashes) {
// Only operate on path component in URL.
const size_t query_start = original_path.find('?');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/size_t/absl::string_view::size_type

// Only operate on path component in URL.
const size_t query_start = original_path.find('?');
const auto path = original_path.substr(0, query_start);
const auto query = absl::ClippedSubstr(original_path, query_start);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handling the npos case for query_start does not appear to be a documented use of substr etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Point 24.4.2. of C++17 standard specifies that std::base_string_view::substr lets rlen be the smaller of n and size() - pos, which covers npos.
absl::string_view is API-compatible with std::string_view so I assume it inherits the spec behavior even if it's not called out in the documentation explicitly.

if (!has_adjacent_slashes) {
// Only operate on path component in URL.
const size_t query_start = original_path.find('?');
const auto path = original_path.substr(0, query_start);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use explicit types in this context as type is not obvious from context unless you know string_view well.

https://google.github.io/styleguide/cppguide.html#auto

return;
}

std::string simplified_path;
simplified_path.reserve(original_path.size());
for (size_t i = 0; i < original_path.size(); ++i) {
if (i > 0 && original_path[i] == '/' && original_path[i - 1] == '/') {
for (size_t i = 0; i < path.size(); ++i) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd still prefer using the absl functions for this to avoid having to look carefully at string-hacking code for common patterns. But absl::Substitute is actually preferred for strings known at compile-time. Just call it in a loop till there are no more occurrences of "//" (change the 'if (path.find" to a while-loop, reversing the polarity of the condition.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about this further as my 'while' suggestion would be n^2 on a string with 100 consecutive slashes.

Let's do the initial scan for "//" via if, but then we can just use split/join to remove duplicate slashes. WDYT?

return absl::StrCat(absl::StrJoin(absl::StrSplit(original_path, "/"), absl::SkipEmpty), "/"), query);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case we would also need to either special-case / prefix or assume that all paths are always absolute.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah also trailing slash, so cases to handle are:

/a//b
/a//b/
a//b
a//b/

It's still probably simpler than the character-at-a-time scan but maybe there's a better absl util to use?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably good to add those cases anyway to the test-suite regardless of the algo we use

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated, as well as added extra tests

<< "original path: " << path_pair.first;
}
auto sanitized_path_value = path_header.value().getStringView();
return std::string(sanitized_path_value.begin(), sanitized_path_value.end());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return std::string(path_header.value().getStringView());

EXPECT_EQ(path_header.value().getStringView(), path_pair.second)
<< "original path: " << path_pair.first;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, ok, that makes sense. I think handling of 3 or more slashes should be covered here, though, as this is the functional unit-test.

// Determines if adjacent slashes in the path are merged into one before any processing of
// requests by HTTP filters or routing. This affects the upstream *:path* header as well. Without
// setting this option incoming requests with path `//dir/file` will not match against route with
// `prefix` match set to `/dir`. Defaults to `false`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you point to the HTTP spec indicating that multiple slashes should be merged? I suspect such a spec for canonicalized URL paths exists -- same place where it discusses handling of "..".

This should referenced in the comments or maybe even in this doc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HTTP spec does not specify that slashes should be merged, moreover it explicitly states that behavior around slashes is resource-specific (and this is one of the reasons to make it optional, and not enable by default).
However it's useful in real world applications to provide a safe-guard against user errors, when we know for sure that the intended behavior is similar to filesystem one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed it doesn't. It does discuss treatment of ".." which Envoy does, but this is AFAICT not a specified behavior. https://tools.ietf.org/html/rfc3986#section-6

WDYT of making this a filter rather than putting it in the core, since it's really an optional extension? I think it's possible for a filter to adjust the path. I don't feel too strongly either way, but with @mattklein123 we've been talking about building a bare-bones Envoy with only the core functionality and if this were a filter in test/extensions/filters/http/... it would be easier to compile it out. Not that it's that expensive, but it might make sense to keep the core in-spec and as small as possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong preference here and can follow a maintainers' decision.
One thing to consider is that until #6602 is resolved invalidating routing decision is somehow expensive. (and we have to redo routing on a path change)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO it's fine to have this type of thing be built in as an option. @alyssawilk @PiotrSikora any objection?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only concern is the slippery slope of too many knobs.
I'm going to claim this passes the two company test, as we elide at least initial multiple slashes ourselves and my guess is we can get away with eliding all but we should either make sure things pass the multi-user test or consider making them pluggable.

Ruslan Nigmatullin added 2 commits July 25, 2019 12:32
Signed-off-by: Ruslan Nigmatullin <[email protected]>
Signed-off-by: Ruslan Nigmatullin <[email protected]>
jmarantz
jmarantz previously approved these changes Jul 25, 2019
Copy link
Contributor

@jmarantz jmarantz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

basically lgtm modulo the question about whether this should be a filter or core functionality, which I think can be part of the @envoyproxy/senior-maintainers review.

Thanks for all the updates :)

@@ -7,6 +7,7 @@ Version history
* config: async data access for local and remote data source.
* config: changed the default value of :ref:`initial_fetch_timeout <envoy_api_field_core.ConfigSource.initial_fetch_timeout>` from 0s to 15s. This is a change in behaviour in the sense that Envoy will move to the next initialization phase, even if the first config is not delivered in 15s. Refer to :ref:`initialization process <arch_overview_initialization>` for more details.
* http: added the ability to reject HTTP/1.1 requests with invalid HTTP header values, using the runtime feature `envoy.reloadable_features.strict_header_validation`.
* http: added the ability to :ref:`merge adjacent slashes<envoy_api_field_config.filter.network.http_connection_manager.v2.HttpConnectionManager.merge_slashes>` in the path.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's probably worth noting here that this canonicalization is not mentioned in any HTTP spec, but is offered as an opt-in convenience to upstreams.

@mattklein123 mattklein123 self-assigned this Jul 26, 2019
Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM on the general approach with 1 comment. Please also merge master to pick up the new CI config. Thank you!

/wait

// requests by HTTP filters or routing. This affects the upstream *:path* header as well. Without
// setting this option incoming requests with path `//dir/file` will not match against route with
// `prefix` match set to `/dir`. Defaults to `false`.
google.protobuf.BoolValue merge_slashes = 33;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be a simple bool type as it defaults to false.

// Determines if adjacent slashes in the path are merged into one before any processing of
// requests by HTTP filters or routing. This affects the upstream *:path* header as well. Without
// setting this option incoming requests with path `//dir/file` will not match against route with
// `prefix` match set to `/dir`. Defaults to `false`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO it's fine to have this type of thing be built in as an option. @alyssawilk @PiotrSikora any objection?

Ruslan Nigmatullin added 2 commits July 26, 2019 09:46
* use bool

Signed-off-by: Ruslan Nigmatullin <[email protected]>
Signed-off-by: Ruslan Nigmatullin <[email protected]>
mattklein123
mattklein123 previously approved these changes Jul 26, 2019
Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks this LGTM but will wait until Monday when @alyssawilk is back in the office to see if she or @PiotrSikora have any comments. Nice work!

PiotrSikora
PiotrSikora previously approved these changes Jul 28, 2019
Copy link
Contributor

@PiotrSikora PiotrSikora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

Copy link
Contributor

@alyssawilk alyssawilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! A few extra thoughts from my pass

@@ -52,5 +54,20 @@ bool PathUtil::canonicalPath(HeaderEntry& path_header) {
return true;
}

void PathUtil::mergeSlashes(HeaderEntry& path_header) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit twitchy about full qualified URLs here. Granted, Envoy currently handles fully qualified urls in the H1 codec by splitting them up, so if the incoming request is GET http://foo.com//bar we'll have :authority foo.com and :path /bar so this is fine today (we won't end up with http:/foo.com/bar).

It might be worth an assert that the path is relative, just in case someone does some lua or use defined headers trying to take advantage of Path == url in firstline and having their absolute URL messed up.

// Determines if adjacent slashes in the path are merged into one before any processing of
// requests by HTTP filters or routing. This affects the upstream *:path* header as well. Without
// setting this option incoming requests with path `//dir/file` will not match against route with
// `prefix` match set to `/dir`. Defaults to `false`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only concern is the slippery slope of too many knobs.
I'm going to claim this passes the two company test, as we elide at least initial multiple slashes ourselves and my guess is we can get away with eliding all but we should either make sure things pass the multi-user test or consider making them pluggable.

}
const absl::string_view prefix = absl::StartsWith(path, "/") ? "/" : absl::string_view();
const absl::string_view suffix = absl::EndsWith(path, "/") ? "/" : absl::string_view();
path_header.value(absl::StrCat(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this is O(n) if we get something whacky like a path which is 16k worth of / yeah?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see anything explicit in the absl documentation, but it's O(n) based on the code.

test/common/http/path_utility_test.cc Show resolved Hide resolved
* update example in docs

Signed-off-by: Ruslan Nigmatullin <[email protected]>
@repokitteh-read-only
Copy link

CC @envoyproxy/api-shepherds: Your approval is needed for changes made to api/.

🐱

Caused by: #7621 was synchronize by euroelessar.

see: more, trace.

Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@mattklein123
Copy link
Member

/azp run envoy-macos

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mattklein123 mattklein123 merged commit deccb6e into envoyproxy:master Jul 30, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

http: Duplicate slashes in the path
7 participants