-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
http: Add ability to merge slashes #7621
Changes from 1 commit
8af987f
5d4a821
43bd6be
eb0c731
d0430b2
a917d92
7a42b3c
ed6b4f3
bb05f02
09018a7
57fa3b4
958a55a
164cd4b
16c08c4
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -52,41 +52,25 @@ bool PathUtil::canonicalPath(HeaderEntry& path_header) { | |
return true; | ||
} | ||
|
||
/* static */ | ||
void PathUtil::mergeSlashes(HeaderEntry& path_header) { | ||
const auto original_path = path_header.value().getStringView(); | ||
if (original_path.empty()) { | ||
return; | ||
} | ||
|
||
bool has_adjacent_slashes = false; | ||
for (size_t i = 1; i < original_path.size(); ++i) { | ||
if (original_path[i] == '/' && original_path[i - 1] == '/') { | ||
has_adjacent_slashes = true; | ||
break; | ||
} | ||
// Only operate on path component in URL. | ||
if (original_path[i] == '?') { | ||
break; | ||
} | ||
} | ||
if (!has_adjacent_slashes) { | ||
// Only operate on path component in URL. | ||
const size_t query_start = original_path.find('?'); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. s/size_t/absl::string_view::size_type |
||
const auto path = original_path.substr(0, query_start); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. use explicit types in this context as type is not obvious from context unless you know string_view well. |
||
const auto query = absl::ClippedSubstr(original_path, query_start); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Handling the npos case for query_start does not appear to be a documented use of substr etc. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Point 24.4.2. of C++17 standard specifies that |
||
if (path.find("//") == absl::string_view::npos) { | ||
return; | ||
} | ||
|
||
std::string simplified_path; | ||
simplified_path.reserve(original_path.size()); | ||
for (size_t i = 0; i < original_path.size(); ++i) { | ||
if (i > 0 && original_path[i] == '/' && original_path[i - 1] == '/') { | ||
for (size_t i = 0; i < path.size(); ++i) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd still prefer using the absl functions for this to avoid having to look carefully at string-hacking code for common patterns. But absl::Substitute is actually preferred for strings known at compile-time. Just call it in a loop till there are no more occurrences of "//" (change the 'if (path.find" to a while-loop, reversing the polarity of the condition. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I thought about this further as my 'while' suggestion would be n^2 on a string with 100 consecutive slashes. Let's do the initial scan for "//" via if, but then we can just use split/join to remove duplicate slashes. WDYT?
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In this case we would also need to either special-case There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah also trailing slash, so cases to handle are:
It's still probably simpler than the character-at-a-time scan but maybe there's a better absl util to use? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. probably good to add those cases anyway to the test-suite regardless of the algo we use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. updated, as well as added extra tests |
||
if (i > 0 && path[i] == '/' && path[i - 1] == '/') { | ||
continue; | ||
} | ||
// Only operate on path component in URL. | ||
if (original_path[i] == '?') { | ||
simplified_path.insert(simplified_path.end(), original_path.begin() + i, original_path.end()); | ||
break; | ||
} | ||
simplified_path.push_back(original_path[i]); | ||
simplified_path.push_back(path[i]); | ||
} | ||
simplified_path.insert(simplified_path.end(), query.begin(), query.end()); | ||
path_header.value(simplified_path); | ||
} | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -87,21 +87,20 @@ TEST_F(PathUtilityTest, NormalizeCasePath) { | |
|
||
// Paths that are valid get normalized. | ||
TEST_F(PathUtilityTest, MergeSlashes) { | ||
const std::vector<std::pair<std::string, std::string>> non_normal_pairs{ | ||
{"", ""}, // empty | ||
{"/a", "/a"}, // no-op | ||
{"//a/b/c", "/a/b/c"}, // double / start | ||
{"/a//b/c", "/a/b/c"}, // double / in the middle | ||
{"/a//b?a=///c", "/a/b?a=///c"}, // slashes in the query are ignored | ||
{"/a//b?", "/a/b?"}, // empty query | ||
}; | ||
|
||
for (const auto& path_pair : non_normal_pairs) { | ||
auto& path_header = pathHeaderEntry(path_pair.first); | ||
auto mergeSlashes = [this](const std::string& path_value) { | ||
auto& path_header = pathHeaderEntry(path_value); | ||
PathUtil::mergeSlashes(path_header); | ||
EXPECT_EQ(path_header.value().getStringView(), path_pair.second) | ||
<< "original path: " << path_pair.first; | ||
} | ||
auto sanitized_path_value = path_header.value().getStringView(); | ||
return std::string(sanitized_path_value.begin(), sanitized_path_value.end()); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. return std::string(path_header.value().getStringView()); |
||
}; | ||
EXPECT_EQ("", mergeSlashes("")); // empty | ||
EXPECT_EQ("/a", mergeSlashes("/a")); // no-op | ||
EXPECT_EQ("/a/b/c", mergeSlashes("//a/b/c")); // double / start | ||
euroelessar marked this conversation as resolved.
Show resolved
Hide resolved
|
||
EXPECT_EQ("/a/b/c", mergeSlashes("/a//b/c")); // double / in the middle | ||
EXPECT_EQ("/a/b/c", mergeSlashes("/a///b/c")); // triple / in the middle | ||
EXPECT_EQ("/a/b/c", mergeSlashes("/a////b/c")); // quadruple / in the middle | ||
EXPECT_EQ("/a/b?a=///c", mergeSlashes("/a//b?a=///c")); // slashes in the query are ignored | ||
EXPECT_EQ("/a/b?", mergeSlashes("/a//b?")); // empty query | ||
} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. include a test method that also ensures that no slash-merging takes place when the option is off. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's covered by There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I see, ok, that makes sense. I think handling of 3 or more slashes should be covered here, though, as this is the functional unit-test. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Test for handling of 3 or more slashes was already added in the latest revision |
||
|
||
} // namespace Http | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit twitchy about full qualified URLs here. Granted, Envoy currently handles fully qualified urls in the H1 codec by splitting them up, so if the incoming request is GET http://foo.com//bar we'll have :authority foo.com and :path /bar so this is fine today (we won't end up with http:/foo.com/bar).
It might be worth an assert that the path is relative, just in case someone does some lua or use defined headers trying to take advantage of Path == url in firstline and having their absolute URL messed up.