Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xDS: add version gauge #1805

Merged
merged 6 commits into from
Oct 4, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/configuration/cluster_manager/cds.rst
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,8 @@ CDS has a statistics tree rooted at *cluster_manager.cds.* with the following st
:header: Name, Type, Description
:widths: 1, 1, 2

config_reload, Counter, Total API fetches that resulted in a config reload due to a different config
update_attempt, Counter, Total API fetches attempted
update_success, Counter, Total API fetches completed successfully
update_failure, Counter, Total API fetches that failed (either network or schema errors)
version, Gauge, Hash of the contents from the last successful API fetch
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we also want to report the actual version_info as provided by v2 xDS?

Copy link
Member Author

@junr03 junr03 Oct 4, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can provide that via a gauge, given that version_info is a string.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could attempt atoul on the first 6 chars similar to the server's version info and fallback to hashing. i'm not sure that it's worth it though.

2 changes: 2 additions & 0 deletions docs/configuration/cluster_manager/cluster_stats.rst
Original file line number Diff line number Diff line change
Expand Up @@ -61,9 +61,11 @@ Every cluster has a statistics tree rooted at *cluster.<name>.* with the followi
membership_healthy, Gauge, Current cluster healthy total (inclusive of both health checking and outlier detection)
membership_total, Gauge, Current cluster membership total
retry_or_shadow_abandoned, Counter, Total number of times shadowing or retry buffering was canceled due to buffer limits.
config_reload, Counter, Total API fetches that resulted in a config reload due to a different config
update_attempt, Counter, Total cluster membership update attempts
update_success, Counter, Total cluster membership update successes
update_failure, Counter, Total cluster membership update failures
version, Gauge, Hash of the contents from the last successful API fetch
max_host_weight, Gauge, Maximum weight of any host in the cluster
bind_errors, Counter, Total errors binding the socket to the configured source address.

Expand Down
5 changes: 4 additions & 1 deletion docs/configuration/http_conn_man/rds.rst
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,9 @@ will only perform a full reload if the hash value changes.
Statistics
----------

RDS has a statistics tree rooted at *http.<stat_prefix>.rds.* with the following statistics:
RDS has a statistics tree rooted at *http.<stat_prefix>.rds.<route_config_name>.*.
Any ``:`` character in the ``route_config_name`` name gets replaced with ``_`` in the
stats tree. The stats tree contains the following statistics:

.. csv-table::
:header: Name, Type, Description
Expand All @@ -79,3 +81,4 @@ RDS has a statistics tree rooted at *http.<stat_prefix>.rds.* with the following
update_attempt, Counter, Total API fetches attempted
update_success, Counter, Total API fetches completed successfully
update_failure, Counter, Total API fetches that failed (either network or schema errors)
version, Gauge, Hash of the contents from the last successful API fetch
2 changes: 2 additions & 0 deletions docs/configuration/listeners/lds.rst
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,8 @@ LDS has a statistics tree rooted at *listener_manager.lds.* with the following s
:header: Name, Type, Description
:widths: 1, 1, 2

config_reload, Counter, Total API fetches that resulted in a config reload due to a different config
update_attempt, Counter, Total API fetches attempted
update_success, Counter, Total API fetches completed successfully
update_failure, Counter, Total API fetches that failed (either network or schema errors)
version, Gauge, Hash of the contents from the last successful API fetch
13 changes: 7 additions & 6 deletions include/envoy/config/subscription.h
Original file line number Diff line number Diff line change
Expand Up @@ -75,18 +75,19 @@ template <class ResourceType> class Subscription {
* Per subscription stats. @see stats_macros.h
*/
// clang-format off
#define ALL_SUBSCRIPTION_STATS(COUNTER) \
COUNTER(update_attempt) \
COUNTER(update_success) \
COUNTER(update_failure) \
COUNTER(update_rejected)
#define ALL_SUBSCRIPTION_STATS(COUNTER, GAUGE) \
COUNTER(update_attempt) \
COUNTER(update_success) \
COUNTER(update_failure) \
COUNTER(update_rejected) \
GAUGE(version)
// clang-format on

/**
* Struct definition for per subscription stats. @see stats_macros.h
*/
struct SubscriptionStats {
ALL_SUBSCRIPTION_STATS(GENERATE_COUNTER_STRUCT)
ALL_SUBSCRIPTION_STATS(GENERATE_COUNTER_STRUCT, GENERATE_GAUGE_STRUCT)
};

} // namespace Config
Expand Down
3 changes: 2 additions & 1 deletion include/envoy/upstream/upstream.h
Original file line number Diff line number Diff line change
Expand Up @@ -235,7 +235,8 @@ class HostSet {
COUNTER(update_attempt) \
COUNTER(update_success) \
COUNTER(update_failure) \
COUNTER(update_empty)
COUNTER(update_empty) \
GAUGE(version)

// clang-format on

Expand Down
1 change: 1 addition & 0 deletions source/common/config/filesystem_subscription_impl.h
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,7 @@ class FilesystemSubscriptionImpl : public Config::Subscription<ResourceType>,
config_update_available = true;
callbacks_->onConfigUpdate(typed_resources);
version_info_ = message.version_info();
stats_.version_.set(HashUtil::xxHash64(version_info_));
stats_.update_success_.inc();
ENVOY_LOG(debug, "Filesystem config update accepted for {}: {}", path_,
message.DebugString());
Expand Down
1 change: 1 addition & 0 deletions source/common/config/grpc_mux_subscription_impl.h
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,7 @@ class GrpcMuxSubscriptionImpl : public Subscription<ResourceType>,
stats_.update_success_.inc();
stats_.update_attempt_.inc();
version_info_ = version_info;
stats_.version_.set(HashUtil::xxHash64(version_info_));
ENVOY_LOG(debug, "gRPC config for {} accepted with {} resources", type_url_, resources.size());

#ifndef NVLOG
Expand Down
1 change: 1 addition & 0 deletions source/common/config/http_subscription_impl.h
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,7 @@ class HttpSubscriptionImpl : public Http::RestApiFetcher,
try {
callbacks_->onConfigUpdate(typed_resources);
request_.set_version_info(message.version_info());
stats_.version_.set(HashUtil::xxHash64(request_.version_info()));
stats_.update_success_.inc();
} catch (const EnvoyException& e) {
ENVOY_LOG(warn, "REST config update rejected: {}", e.what());
Expand Down
15 changes: 10 additions & 5 deletions source/common/config/utility.h
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
#include "envoy/json/json_object.h"
#include "envoy/local_info/local_info.h"
#include "envoy/registry/registry.h"
#include "envoy/stats/stats.h"
#include "envoy/upstream/cluster_manager.h"

#include "common/common/assert.h"
Expand Down Expand Up @@ -59,11 +60,15 @@ class Utility {
}

/**
* Legacy APIs uses JSON and do not have an explicit version. Hash the body and append
* a user-friendly prefix.
* Legacy APIs uses JSON and do not have an explicit version.
* @param input the input to hash.
* @return std::pair<std::string, uint64_t> the string is the hash converted into
* a hex string, pre-pended by a user friendly prefix. The uint64_t is the
* raw hash.
*/
static std::string computeHashedVersion(const std::string& input) {
return "hash_" + Hex::uint64ToHex(HashUtil::xxHash64(input));
static std::pair<std::string, uint64_t> computeHashedVersion(const std::string& input) {
uint64_t hash = HashUtil::xxHash64(input);
return std::make_pair("hash_" + Hex::uint64ToHex(hash), hash);
}

/**
Expand Down Expand Up @@ -149,7 +154,7 @@ class Utility {
* @return SubscriptionStats for scope.
*/
static SubscriptionStats generateStats(Stats::Scope& scope) {
return {ALL_SUBSCRIPTION_STATS(POOL_COUNTER(scope))};
return {ALL_SUBSCRIPTION_STATS(POOL_COUNTER(scope), POOL_GAUGE(scope))};
}

/**
Expand Down
3 changes: 2 additions & 1 deletion source/common/router/rds_impl.cc
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,8 @@ RdsRouteConfigProviderImpl::RdsRouteConfigProviderImpl(
const std::string& stat_prefix, ThreadLocal::SlotAllocator& tls,
RouteConfigProviderManagerImpl& route_config_provider_manager)
: runtime_(runtime), cm_(cm), tls_(tls.allocateSlot()),
route_config_name_(rds.route_config_name()), scope_(scope.createScope(stat_prefix + "rds.")),
route_config_name_(rds.route_config_name()),
scope_(scope.createScope(stat_prefix + "rds." + route_config_name_ + ".")),
stats_({ALL_RDS_STATS(POOL_COUNTER(*scope_))}),
route_config_provider_manager_(route_config_provider_manager),
manager_identifier_(manager_identifier) {
Expand Down
5 changes: 4 additions & 1 deletion source/common/router/rds_subscription.cc
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,10 @@ void RdsSubscription::parseResponse(const Http::Message& response) {
Envoy::Config::RdsJson::translateRouteConfiguration(*response_json, *resources.Add());
resources[0].set_name(route_config_name_);
callbacks_->onConfigUpdate(resources);
version_info_ = Envoy::Config::Utility::computeHashedVersion(response_body);
std::pair<std::string, uint64_t> hash =
Envoy::Config::Utility::computeHashedVersion(response_body);
version_info_ = hash.first;
stats_.version_.set(hash.second);
stats_.update_success_.inc();
}

Expand Down
5 changes: 4 additions & 1 deletion source/common/upstream/cds_subscription.cc
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,10 @@ void CdsSubscription::parseResponse(const Http::Message& response) {
}

callbacks_->onConfigUpdate(resources);
version_info_ = Config::Utility::computeHashedVersion(response_body);
std::pair<std::string, uint64_t> hash =
Envoy::Config::Utility::computeHashedVersion(response_body);
version_info_ = hash.first;
stats_.version_.set(hash.second);
stats_.update_success_.inc();
}

Expand Down
5 changes: 4 additions & 1 deletion source/common/upstream/sds_subscription.cc
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,10 @@ void SdsSubscription::parseResponse(const Http::Message& response) {
}

callbacks_->onConfigUpdate(resources);
version_info_ = Config::Utility::computeHashedVersion(response_body);
std::pair<std::string, uint64_t> hash =
Envoy::Config::Utility::computeHashedVersion(response_body);
version_info_ = hash.first;
stats_.version_.set(hash.second);
stats_.update_success_.inc();
}

Expand Down
5 changes: 4 additions & 1 deletion source/server/lds_subscription.cc
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,10 @@ void LdsSubscription::parseResponse(const Http::Message& response) {
}

callbacks_->onConfigUpdate(resources);
version_info_ = Config::Utility::computeHashedVersion(response_body);
std::pair<std::string, uint64_t> hash =
Envoy::Config::Utility::computeHashedVersion(response_body);
version_info_ = hash.first;
stats_.version_.set(hash.second);
stats_.update_success_.inc();
}

Expand Down
8 changes: 4 additions & 4 deletions test/common/config/filesystem_subscription_impl_test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -12,19 +12,19 @@ class FilesystemSubscriptionImplTest : public FilesystemSubscriptionTestHarness,
// Validate that the client can recover from bad JSON responses.
TEST_F(FilesystemSubscriptionImplTest, BadJsonRecovery) {
startSubscription({"cluster0", "cluster1"});
verifyStats(1, 0, 0, 0);
verifyStats(1, 0, 0, 0, 0);
EXPECT_CALL(callbacks_, onConfigUpdateFailed(_));
updateFile(";!@#badjso n");
verifyStats(2, 0, 0, 1);
verifyStats(2, 0, 0, 1, 0);
deliverConfigUpdate({"cluster0", "cluster1"}, "0", true);
verifyStats(3, 1, 0, 1);
verifyStats(3, 1, 0, 1, 7148434200721666028);
}

// Validate that a file that is initially available results in a successful update.
TEST_F(FilesystemSubscriptionImplTest, InitialFile) {
updateFile("{\"versionInfo\": \"0\", \"resources\": []}", false);
startSubscription({"cluster0", "cluster1"});
verifyStats(1, 1, 0, 0);
verifyStats(1, 1, 0, 0, 7148434200721666028);
}

} // namespace
Expand Down
6 changes: 3 additions & 3 deletions test/common/config/filesystem_subscription_test_harness.h
Original file line number Diff line number Diff line change
Expand Up @@ -83,11 +83,11 @@ class FilesystemSubscriptionTestHarness : public SubscriptionTestHarness {
EXPECT_EQ(version_, subscription_.versionInfo());
}

void verifyStats(uint32_t attempt, uint32_t success, uint32_t rejected,
uint32_t failure) override {
void verifyStats(uint32_t attempt, uint32_t success, uint32_t rejected, uint32_t failure,
uint64_t version) override {
// The first attempt always fail unless there was a file there to begin with.
SubscriptionTestHarness::verifyStats(attempt, success, rejected,
failure + (file_at_start_ ? 0 : 1));
failure + (file_at_start_ ? 0 : 1), version);
}

const std::string path_;
Expand Down
24 changes: 12 additions & 12 deletions test/common/config/grpc_subscription_impl_test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -17,54 +17,54 @@ TEST_F(GrpcSubscriptionImplTest, StreamCreationFailure) {
EXPECT_CALL(callbacks_, onConfigUpdateFailed(_));
EXPECT_CALL(*timer_, enableTimer(_));
subscription_->start({"cluster0", "cluster1"}, callbacks_);
verifyStats(2, 0, 0, 1);
verifyStats(2, 0, 0, 1, 0);
// Ensure this doesn't cause an issue by sending a request, since we don't
// have a gRPC stream.
subscription_->updateResources({"cluster2"});
// Retry and succeed.
EXPECT_CALL(*async_client_, start(_, _)).WillOnce(Return(&async_stream_));
expectSendMessage({"cluster2"}, "");
timer_cb_();
verifyStats(3, 0, 0, 1);
verifyStats(3, 0, 0, 1, 0);
}

// Validate that the client can recover from a remote stream closure via retry.
TEST_F(GrpcSubscriptionImplTest, RemoteStreamClose) {
startSubscription({"cluster0", "cluster1"});
verifyStats(1, 0, 0, 0);
verifyStats(1, 0, 0, 0, 0);
Http::HeaderMapPtr trailers{new Http::TestHeaderMapImpl{}};
subscription_->grpcMux().onReceiveTrailingMetadata(std::move(trailers));
EXPECT_CALL(*timer_, enableTimer(_));
EXPECT_CALL(callbacks_, onConfigUpdateFailed(_));
subscription_->grpcMux().onRemoteClose(Grpc::Status::GrpcStatus::Canceled, "");
verifyStats(2, 0, 0, 1);
verifyStats(2, 0, 0, 1, 0);
// Retry and succeed.
EXPECT_CALL(*async_client_, start(_, _)).WillOnce(Return(&async_stream_));
expectSendMessage({"cluster0", "cluster1"}, "");
timer_cb_();
verifyStats(2, 0, 0, 1);
verifyStats(2, 0, 0, 1, 0);
}

// Validate that When the management server gets multiple requests for the same version, it can
// ignore later ones. This allows the nonce to be used.
TEST_F(GrpcSubscriptionImplTest, RepeatedNonce) {
InSequence s;
startSubscription({"cluster0", "cluster1"});
verifyStats(1, 0, 0, 0);
verifyStats(1, 0, 0, 0, 0);
// First with the initial, empty version update to "0".
updateResources({"cluster2"});
verifyStats(2, 0, 0, 0);
verifyStats(2, 0, 0, 0, 0);
deliverConfigUpdate({"cluster0", "cluster2"}, "0", false);
verifyStats(3, 0, 1, 0);
verifyStats(3, 0, 1, 0, 0);
deliverConfigUpdate({"cluster0", "cluster2"}, "0", true);
verifyStats(4, 1, 1, 0);
verifyStats(4, 1, 1, 0, 7148434200721666028);
// Now with version "0" update to "1".
updateResources({"cluster3"});
verifyStats(5, 1, 1, 0);
verifyStats(5, 1, 1, 0, 7148434200721666028);
deliverConfigUpdate({"cluster3"}, "1", false);
verifyStats(6, 1, 2, 0);
verifyStats(6, 1, 2, 0, 7148434200721666028);
deliverConfigUpdate({"cluster3"}, "1", true);
verifyStats(7, 2, 2, 0);
verifyStats(7, 2, 2, 0, 13237225503670494420U);
}

} // namespace
Expand Down
12 changes: 6 additions & 6 deletions test/common/config/http_subscription_impl_test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -15,11 +15,11 @@ TEST_F(HttpSubscriptionImplTest, OnRequestReset) {
EXPECT_CALL(*timer_, enableTimer(_));
EXPECT_CALL(callbacks_, onConfigUpdateFailed(_));
http_callbacks_->onFailure(Http::AsyncClient::FailureReason::Reset);
verifyStats(1, 0, 0, 1);
verifyStats(1, 0, 0, 1, 0);
timerTick();
verifyStats(2, 0, 0, 1);
verifyStats(2, 0, 0, 1, 0);
deliverConfigUpdate({"cluster0", "cluster1"}, "0", true);
verifyStats(3, 1, 0, 1);
verifyStats(3, 1, 0, 1, 7148434200721666028);
}

// Validate that the client can recover from bad JSON responses.
Expand All @@ -32,12 +32,12 @@ TEST_F(HttpSubscriptionImplTest, BadJsonRecovery) {
EXPECT_CALL(*timer_, enableTimer(_));
EXPECT_CALL(callbacks_, onConfigUpdateFailed(_));
http_callbacks_->onSuccess(std::move(message));
verifyStats(1, 0, 0, 1);
verifyStats(1, 0, 0, 1, 0);
request_in_progress_ = false;
timerTick();
verifyStats(2, 0, 0, 1);
verifyStats(2, 0, 0, 1, 0);
deliverConfigUpdate({"cluster0", "cluster1"}, "0", true);
verifyStats(3, 1, 0, 1);
verifyStats(3, 1, 0, 1, 7148434200721666028);
}

} // namespace
Expand Down
Loading