Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

upstream: fix subset_lb crash when host configured with no metadata #15171

Merged
merged 5 commits into from
Mar 10, 2021
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 15 additions & 13 deletions source/common/upstream/subset_lb.cc
Original file line number Diff line number Diff line change
Expand Up @@ -128,19 +128,21 @@ void SubsetLoadBalancer::rebuildSingle() {
for (const auto& host_set : original_priority_set_.hostSetsPerPriority()) {
for (const auto& host : host_set->hosts()) {
MetadataConstSharedPtr metadata = host->metadata();
const auto& filter_metadata = metadata->filter_metadata();
auto filter_it = filter_metadata.find(Config::MetadataFilters::get().ENVOY_LB);
if (filter_it != filter_metadata.end()) {
const auto& fields = filter_it->second.fields();
auto fields_it = fields.find(single_key_);
if (fields_it != fields.end()) {
auto [iterator, did_insert] =
single_host_per_subset_map_.try_emplace(fields_it->second, host);
UNREFERENCED_PARAMETER(iterator);
if (!did_insert) {
// Two hosts with the same metadata value were found. Ignore all but one of them, and
// set a metric for how many times this happened.
collision_count++;
if (metadata != nullptr) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe do

if (metadata == nullptr) {
  continue;
}

to reduce the nesting?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, will do.

const auto& filter_metadata = metadata->filter_metadata();
auto filter_it = filter_metadata.find(Config::MetadataFilters::get().ENVOY_LB);
if (filter_it != filter_metadata.end()) {
const auto& fields = filter_it->second.fields();
auto fields_it = fields.find(single_key_);
if (fields_it != fields.end()) {
auto [iterator, did_insert] =
single_host_per_subset_map_.try_emplace(fields_it->second, host);
UNREFERENCED_PARAMETER(iterator);
if (!did_insert) {
// Two hosts with the same metadata value were found. Ignore all but one of them, and
// set a metric for how many times this happened.
collision_count++;
}
}
}
}
Expand Down
31 changes: 31 additions & 0 deletions test/integration/http_subset_lb_integration_test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -216,4 +216,35 @@ TEST_P(HttpSubsetLbIntegrationTest, SubsetLoadBalancer) {
runTest(type_b_request_headers_, "b");
}

// Tests subset-compatible load balancer policy without metadata does not crash on initialization.
TEST_P(HttpSubsetLbIntegrationTest, SubsetLoadBalancerNoMetadata) {
config_helper_.addConfigModifier([&](envoy::config::bootstrap::v3::Bootstrap& bootstrap) {
auto* static_resources = bootstrap.mutable_static_resources();
auto* cluster = static_resources->mutable_clusters(0);

// Set single_host_per_subset to be true
auto* subset_selector = cluster->mutable_lb_subset_config()->mutable_subset_selectors(0);
subset_selector->set_single_host_per_subset(true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this crash is limited to when this option is enabled, could we be more explicit about this in the comments or test name? Maybe even add a test case where this is disabled for completeness sake.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this crash only happens when single_host_per_subset is set to be true. Without it subset_lb.cc:rebuildSingle() will bailout early hence no crash. I will add some comments here to cover this.


// Clear the metadata for each host
auto* load_assignment = cluster->mutable_load_assignment();
auto* endpoints = load_assignment->mutable_endpoints(0);
for (uint32_t i = 0; i < num_hosts_; i++) {
auto* lb_endpoint = endpoints->mutable_lb_endpoints(i);
lb_endpoint->clear_metadata();
}
});

initialize();
codec_client_ = makeHttpConnection(lookupPort("http"));
auto response = codec_client_->makeHeaderOnlyRequest(
Http::TestRequestHeaderMapImpl{{":method", "GET"},
{":path", "/test/long/url"},
{":scheme", "http"},
{":authority", "host"}});
response->waitForEndStream();
EXPECT_TRUE(response->complete());
EXPECT_THAT(response->headers(), Http::HttpStatusIs("404"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know why this is giving a 404 instead of 503?
It might be configurable, but the default is 503

enum ClusterNotFoundResponseCode {

Copy link
Contributor Author

@yanjunxiang-google yanjunxiang-google Feb 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The configuration for this test is below: cluster and endpoints are configured. My understand is that with this configuration service is available, but page is not found, hence 404 is returned.

static_resources:
clusters:
- name: cluster_0
connect_timeout: 5s
lb_subset_config:
subset_selectors:
- keys:
- type
single_host_per_subset: true
load_assignment:
cluster_name: cluster_0
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: 127.0.0.1
port_value: 46423
- endpoint:
address:
socket_address:
address: 127.0.0.1
port_value: 40245
- endpoint:
address:
socket_address:
address: 127.0.0.1
port_value: 39563
- endpoint:
address:
socket_address:
address: 127.0.0.1
port_value: 37041

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually now that I think about it I think we're probably hitting a NR here and not actually hitting the bad code. I think the test file sets up a requirement for the header x-type to be set to either a or b in order to actually route to the upstream. Can you take a look and make sure we're actually attempting to route via the subset lb?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Let me take a look.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. The bad code in subset_lb.cc:SubsetLoadBalancer::rebuildSingle() is hit by configuration update, not by traffic. Which means in the newly added test code:
    TEST_P(HttpSubsetLbIntegrationTest, SubsetLoadBalancerSingleHostPerSubsetNoMetadata)

initialize(), which eventually call SubsetLoadBalancer::rebuildSingle() can trigger crash. We reproduced with this, also verified the fix with it.

  1. The original test function: TEST_P(HttpSubsetLbIntegrationTest, SubsetLoadBalancer) setup the config with metadata as
    ...
    - lb_endpoints:
    - endpoint:
    address:
    socket_address:
    address: 127.0.0.1
    port_value: 34033
    metadata:
    filter_metadata:
    envoy.lb:
    type: a
    - endpoint:
    address:
    socket_address:
    address: 127.0.0.1
    port_value: 35437
    metadata:
    filter_metadata:
    envoy.lb:
    type: b

...
also call runTest() with type_a and type_b as in TEST_P(..., SubsetLoadBalancer), in this test, Envoy is expected to route via subset lb with this test.

However, the newly added test function doesn't have type a/b metadata config, neither call runTest with the two different request types, it probably won't route via the subset lb.

  1. In the newly added test function, the code after initialize(), i.e, creating HTTP connection and sending a HTTP request and verify the response is not specially testing the code change in SubsetLoadBalancer::rebuildSingle(), but a general purpose test, i.e, just to send some traffic, and verify response as expected.

Please let me know whether this addressed your question.

Thanks!

}
asraa marked this conversation as resolved.
Show resolved Hide resolved

} // namespace Envoy

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.