Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

http: Add ability to merge slashes #7621

Merged
merged 14 commits into from
Jul 30, 2019
2 changes: 1 addition & 1 deletion source/common/http/conn_manager_utility.cc
Original file line number Diff line number Diff line change
Expand Up @@ -397,7 +397,7 @@ bool ConnectionManagerUtility::maybeNormalizePath(HeaderMap& request_headers,
is_valid_path = PathUtil::canonicalPath(*request_headers.Path());
}
// Merge slashes after path normalization to catch potential edge cases with percent encoding.
if (config.shouldMergeSlashes()) {
if (is_valid_path && config.shouldMergeSlashes()) {
PathUtil::mergeSlashes(*request_headers.Path());
}
return is_valid_path;
Expand Down
34 changes: 9 additions & 25 deletions source/common/http/path_utility.cc
Original file line number Diff line number Diff line change
Expand Up @@ -52,41 +52,25 @@ bool PathUtil::canonicalPath(HeaderEntry& path_header) {
return true;
}

/* static */
void PathUtil::mergeSlashes(HeaderEntry& path_header) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit twitchy about full qualified URLs here. Granted, Envoy currently handles fully qualified urls in the H1 codec by splitting them up, so if the incoming request is GET http://foo.com//bar we'll have :authority foo.com and :path /bar so this is fine today (we won't end up with http:/foo.com/bar).

It might be worth an assert that the path is relative, just in case someone does some lua or use defined headers trying to take advantage of Path == url in firstline and having their absolute URL messed up.

const auto original_path = path_header.value().getStringView();
if (original_path.empty()) {
return;
}

bool has_adjacent_slashes = false;
for (size_t i = 1; i < original_path.size(); ++i) {
if (original_path[i] == '/' && original_path[i - 1] == '/') {
has_adjacent_slashes = true;
break;
}
// Only operate on path component in URL.
if (original_path[i] == '?') {
break;
}
}
if (!has_adjacent_slashes) {
// Only operate on path component in URL.
const size_t query_start = original_path.find('?');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/size_t/absl::string_view::size_type

const auto path = original_path.substr(0, query_start);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use explicit types in this context as type is not obvious from context unless you know string_view well.

https://google.github.io/styleguide/cppguide.html#auto

const auto query = absl::ClippedSubstr(original_path, query_start);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handling the npos case for query_start does not appear to be a documented use of substr etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Point 24.4.2. of C++17 standard specifies that std::base_string_view::substr lets rlen be the smaller of n and size() - pos, which covers npos.
absl::string_view is API-compatible with std::string_view so I assume it inherits the spec behavior even if it's not called out in the documentation explicitly.

if (path.find("//") == absl::string_view::npos) {
return;
}

std::string simplified_path;
simplified_path.reserve(original_path.size());
for (size_t i = 0; i < original_path.size(); ++i) {
if (i > 0 && original_path[i] == '/' && original_path[i - 1] == '/') {
for (size_t i = 0; i < path.size(); ++i) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd still prefer using the absl functions for this to avoid having to look carefully at string-hacking code for common patterns. But absl::Substitute is actually preferred for strings known at compile-time. Just call it in a loop till there are no more occurrences of "//" (change the 'if (path.find" to a while-loop, reversing the polarity of the condition.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about this further as my 'while' suggestion would be n^2 on a string with 100 consecutive slashes.

Let's do the initial scan for "//" via if, but then we can just use split/join to remove duplicate slashes. WDYT?

return absl::StrCat(absl::StrJoin(absl::StrSplit(original_path, "/"), absl::SkipEmpty), "/"), query);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case we would also need to either special-case / prefix or assume that all paths are always absolute.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah also trailing slash, so cases to handle are:

/a//b
/a//b/
a//b
a//b/

It's still probably simpler than the character-at-a-time scan but maybe there's a better absl util to use?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably good to add those cases anyway to the test-suite regardless of the algo we use

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated, as well as added extra tests

if (i > 0 && path[i] == '/' && path[i - 1] == '/') {
continue;
}
// Only operate on path component in URL.
if (original_path[i] == '?') {
simplified_path.insert(simplified_path.end(), original_path.begin() + i, original_path.end());
break;
}
simplified_path.push_back(original_path[i]);
simplified_path.push_back(path[i]);
}
simplified_path.insert(simplified_path.end(), query.begin(), query.end());
path_header.value(simplified_path);
}

Expand Down
27 changes: 13 additions & 14 deletions test/common/http/path_utility_test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -87,21 +87,20 @@ TEST_F(PathUtilityTest, NormalizeCasePath) {

// Paths that are valid get normalized.
TEST_F(PathUtilityTest, MergeSlashes) {
const std::vector<std::pair<std::string, std::string>> non_normal_pairs{
{"", ""}, // empty
{"/a", "/a"}, // no-op
{"//a/b/c", "/a/b/c"}, // double / start
{"/a//b/c", "/a/b/c"}, // double / in the middle
{"/a//b?a=///c", "/a/b?a=///c"}, // slashes in the query are ignored
{"/a//b?", "/a/b?"}, // empty query
};

for (const auto& path_pair : non_normal_pairs) {
auto& path_header = pathHeaderEntry(path_pair.first);
auto mergeSlashes = [this](const std::string& path_value) {
auto& path_header = pathHeaderEntry(path_value);
PathUtil::mergeSlashes(path_header);
EXPECT_EQ(path_header.value().getStringView(), path_pair.second)
<< "original path: " << path_pair.first;
}
auto sanitized_path_value = path_header.value().getStringView();
return std::string(sanitized_path_value.begin(), sanitized_path_value.end());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return std::string(path_header.value().getStringView());

};
EXPECT_EQ("", mergeSlashes("")); // empty
EXPECT_EQ("/a", mergeSlashes("/a")); // no-op
EXPECT_EQ("/a/b/c", mergeSlashes("//a/b/c")); // double / start
euroelessar marked this conversation as resolved.
Show resolved Hide resolved
EXPECT_EQ("/a/b/c", mergeSlashes("/a//b/c")); // double / in the middle
EXPECT_EQ("/a/b/c", mergeSlashes("/a///b/c")); // triple / in the middle
EXPECT_EQ("/a/b/c", mergeSlashes("/a////b/c")); // quadruple / in the middle
EXPECT_EQ("/a/b?a=///c", mergeSlashes("/a//b?a=///c")); // slashes in the query are ignored
EXPECT_EQ("/a/b?", mergeSlashes("/a//b?")); // empty query
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

include a test method that also ensures that no slash-merging takes place when the option is off.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's covered by ConnectionManagerUtilityTest::MergeSlashesDefaultOff test

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, ok, that makes sense. I think handling of 3 or more slashes should be covered here, though, as this is the functional unit-test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test for handling of 3 or more slashes was already added in the latest revision


} // namespace Http
Expand Down