Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stats: allow specifying tags when creating metric objects #9743

Merged
merged 46 commits into from
Feb 28, 2020

Conversation

snowp
Copy link
Contributor

@snowp snowp commented Jan 19, 2020

Signed-off-by: Snow Pettersen [email protected]

Risk Level: Low
Testing: UTs
Docs Changes: n/a
Release Notes: n/a

#9194

@snowp
Copy link
Contributor Author

snowp commented Jan 19, 2020

@jmarantz Let me know if this looks like the right approach and I'll do the same for the other metric objects. One question: the std::string metric functions on the scope (Scope::histogram) are noted as being deprecated. Should I be adding additional overloads to them or just stick to the _fromStatName?

@@ -113,6 +113,11 @@ class IsolatedStoreImpl : public StoreImpl {
Histogram& histogram = histograms_.get(name, unit);
return histogram;
}
Histogram& histogramFromStatName(StatName name, const std::vector<Tag>&,
Histogram::Unit unit) override {
Histogram& histogram = histograms_.get(name, unit);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is obviously not right, might need to key the metrics off a pair of name + tag?

Copy link
Contributor

@jmarantz jmarantz Jan 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

depends. is 'name' here the tag-extracted name? Or does it have all the tags in it?

If it's the former, then I would expect that I could have two histograms which were identical in tag-extracted name but differ in tag-values.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the issue is that so far is that just using the original name of the histogram is sufficient to uniquely identify it: since the tag extracted name + tags is a function of original metric name + bootstrap config, tags can be ignored when looking up cached metric objects. It seems like the stat stores are using the full stat name as the hash key.

I'll play around with some ways of encoding the static tags into the hash key

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the easiest mechanism would be to synthesize a StatName the incorporates the tag names & values, e.g.

scope.subsystem.counter1 [tag1=value1, tag2=value2]
could be encoded to have a StatName of:
scope.subsystem.counter1.TAGS.tag1.value1.tag2.value2
this would be a memory-efficient encoding because the tags and values would all be references to the same symbols.

Then you don't have to change the hashing at all, and they can be stored in the same table as the existing stats.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side comment: it would be cool if we could preserve the positions of the tags in the synthesized string. That way we can optimize away 1) the regex to extract tags back for prometheus, 2) linear number of regexes for each tag name.

snowp added 2 commits January 19, 2020 11:04
Signed-off-by: Snow Pettersen <[email protected]>
Signed-off-by: Snow Pettersen <[email protected]>
include/envoy/stats/tag.h Show resolved Hide resolved
@@ -113,6 +113,11 @@ class IsolatedStoreImpl : public StoreImpl {
Histogram& histogram = histograms_.get(name, unit);
return histogram;
}
Histogram& histogramFromStatName(StatName name, const std::vector<Tag>&,
Histogram::Unit unit) override {
Histogram& histogram = histograms_.get(name, unit);
Copy link
Contributor

@jmarantz jmarantz Jan 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

depends. is 'name' here the tag-extracted name? Or does it have all the tags in it?

If it's the former, then I would expect that I could have two histograms which were identical in tag-extracted name but differ in tag-values.

RefcountPtr<ParentHistogramImpl> stat(new ParentHistogramImpl(
final_stat_name, unit, parent_, *this, extraction.tagExtractedName(), extraction.tags()));
final_stat_name, unit, parent_, *this, extraction.tagExtractedName(), tags));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need the central & TLS caches to look at the tags as part of their key, if they are not part of the passed-in StatName.

Signed-off-by: Snow Pettersen <[email protected]>
@stale
Copy link

stale bot commented Jan 28, 2020

This pull request has been automatically marked as stale because it has not had activity in the last 7 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

@stale stale bot added the stale stalebot believes this issue/PR has not been touched recently label Jan 28, 2020
@jmarantz
Copy link
Contributor

/wait

@snowp lmk when you want me to pick this back up.

@stale stale bot removed the stale stalebot believes this issue/PR has not been touched recently label Jan 31, 2020
@stale
Copy link

stale bot commented Feb 7, 2020

This pull request has been automatically marked as stale because it has not had activity in the last 7 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

@stale stale bot added the stale stalebot believes this issue/PR has not been touched recently label Feb 7, 2020
@snowp
Copy link
Contributor Author

snowp commented Feb 10, 2020

Back from vacation now and will try to get this into a reviewable state

@stale stale bot removed the stale stalebot believes this issue/PR has not been touched recently label Feb 10, 2020
Signed-off-by: Snow Pettersen <[email protected]>
Signed-off-by: Snow Pettersen <[email protected]>
Signed-off-by: Snow Pettersen <[email protected]>
Signed-off-by: Snow Pettersen <[email protected]>
@@ -457,7 +457,24 @@ Gauge& ThreadLocalStoreImpl::ScopeImpl::gaugeFromStatName(StatName name,
return gauge;
}

SymbolTable::StoragePtr
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jmarantz the way I'm constructing the full stat name here seems to work at the surface, but when I try to fetch the stats back from the store it fails to find the existing stat, and it looks like the stat names in the store are garbage (a symbolTable().toString ends up printing out of bounds). Is there anything obviously wrong that I'm doing here?

I kinda suspect that the storage isn't living long enough to be valid after I exit this scope, but not sure how else to create the symbol names.


auto stat_name_no_tags = StatName(symbolTable().join({prefix_.statName(), name}).get());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

store the return StoragePtr in a temp that outlives stat_name_no_tags.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend also not using 'auto' here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one more thing here, just for your intuition: StatName is a lot like string_view; you would probably not want to do:

absl::string_view view(std::string("xx"));

because the std::string would be destructed before you could use the view.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, I think my confusion comes from the fact that it seems like it's okay to destruct the Storage after the stat name is handed to the store/scope, but I'm guessing the store takes ownership over that in some sense, or perhaps copies it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copies it.

@snowp
Copy link
Contributor Author

snowp commented Feb 21, 2020

Going through and updating call sites made me realize that this won't be compatible with hot restart stats transfer in its current form due to the stats transfer just sending the name/value of the stats. Presumably we could pack the tag names into the name and parse them out on the receiving end, but that can probably wait until a follow up PR

Signed-off-by: Snow Pettersen <[email protected]>
Signed-off-by: Snow Pettersen <[email protected]>
Signed-off-by: Snow Pettersen <[email protected]>
Signed-off-by: Snow Pettersen <[email protected]>
@@ -49,7 +49,9 @@ void ContextImpl::chargeStat(const Upstream::ClusterInfo& cluster, Protocol prot
symbol_table_.join({protocolStatName(protocol), request_names.service_, request_names.method_,
status_stat_name});

cluster.statsScope().counterFromStatName(Stats::StatName(stat_name_storage.get())).inc();
cluster.statsScope()
.counterFromStatName(Stats::StatName(stat_name_storage.get()), absl::nullopt)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this does make the PR a lot bigger (and was probably annoying to do; sorry about that). On the plus side, having only a single implementation in the various stores is a big win.

I think I agree with the rationale in the Google Style Guide pre-C++11.

https://google.github.io/styleguide/cppguide.html#Default_Arguments

However, with C++11 I feel that the argument against default-args on virtual functions misses the distinction between PURE interfaces and overrides, which are more explicit in C++11. i.e. if we define the default value only at the interface, and disallow changing the default for overrides, I don't see how what scenario doesn't behave correctly.

Moreover there is precedent in the Envoy code base for this pattern of default-args in interfaces. A simple grep finds:

./envoy/json/json_object.h:80:  virtual ObjectSharedPtr getObject(const std::string& name, bool allow_empty = false) const PURE;
./envoy/json/json_object.h:107:                                                      bool allow_empty = false) const PURE;
./envoy/json/json_object.h:132:                                                  bool allow_empty = false) const PURE;
./envoy/common/scope_tracker.h:26:  virtual void dumpState(std::ostream& os, int indent_level = 0) const PURE;
./envoy/common/pure.h:7:#define PURE = 0
./envoy/stream_info/filter_state.h:78:                       StateType state_type, LifeSpan life_span = LifeSpan::FilterChain) PURE;
./envoy/event/timer.h:43:                           const ScopeTrackedObject* object = nullptr) PURE;
./envoy/event/timer.h:53:                             const ScopeTrackedObject* object = nullptr) PURE;
./envoy/http/api_listener.h:27:                                    bool is_internally_created = false) PURE;
./envoy/http/header_map.h:602:  virtual void dumpState(std::ostream& os, int indent_level = 0) const PURE;
./envoy/http/codec.h:489:                                    bool is_internally_created = false) PURE;

WDYT?

The other alternative is to just use a different interface name for the methods that take tag-vectors.

I'm also OK with what you have if you strongly prefer but I'd be interested in @mattklein123 's opinion on this. There are other cases where we deviate from the Google style guide, and document that fact in https://github.com/envoyproxy/envoy/blob/master/STYLE.md .

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with allowing default values on interface functions. I honestly don't know the history of why they are generically blocked.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found out why this is disallowed; it doesn't work as I thought it ought to:

struct Base {
  virtual void f(int i = 42) = 0;
};

struct Derived : Base {
  void f(int) override;
};

int main() {
  Derived d;
  Base& b = d;  // Use the same object via a different name and static type.
  b.f();  // OK, equivalent to d.Derived::f(42).
  d.f();  // Error.  Derived::f requires one argument.
}

I think we either have to explicitly give the nullopt arg at every counterFromStatName() call, or we'll have to use a non-virtual helper. I'm OK either way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem I ran into with the non-virtual helper is that StatName isn't a complete type at this point so I can't actually define the non-virtual helper on the top level Scope class. We could define StatName in include/envoy/common/stats/symbol_table.h, but that puts a fair bit of implementation logic into the include/ which seems undesirable. The other way to make this possible would be to use a virtual interface class for it, but I'm guessing we don't want that for the sake of keeping StatName simple?

Using a default value on the top level definition would be helpful for most of the use cases since most things are invoking the functions from a Scope reference, so it might still be useful even if the default value overload doesn't inherit? It might end up being confusing, but if we're just trying to solve for common case it does make the API a bit simpler.

That said I'm happy to keep the explicit value if we think that's the way to go.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you define the non-virtual helper if it's passing through const StatName& rather than StatName?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was able to get it working by using const StatName& and StatNameTagVectorOptRef everywhere, ptal

* @param tags optionally specified tags.
* @return a counter within the scope's namespace.
*/
virtual Counter& counterFromStatName(const StatName& name, StatNameTagVectorOptRef tags) PURE;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we rename this method so we don't have a virtual and non-virtual method with the same name?

@@ -16,6 +16,7 @@ namespace Stats {

class Allocator;
struct Tag;
using TagVector = std::vector<Tag>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we can we just include tag.h here rather than predeclaring Tag and repeating the using?

@@ -136,15 +145,15 @@ class IsolatedStoreImpl : public StoreImpl {

Counter& counter(const std::string& name) override {
StatNameManagedStorage storage(name, symbolTable());
return counterFromStatName(storage.statName());
return counterFromStatName(storage.statName(), absl::nullopt);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can drop these absl::nullopt args now right?

@@ -45,7 +45,7 @@ class ThreadLocalStorePerf {

void accessCounters() {
for (auto& stat_name_storage : stat_names_) {
store_.counterFromStatName(stat_name_storage->statName());
store_.Stats::Store::counterFromStatName(stat_name_storage->statName());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this hack is due to having virtual/non-virtual overloads of the same name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, ill rename one of them to get rid of this.

Copy link
Contributor

@jmarantz jmarantz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is getting much closer; rename the methods that take the tag-vector (e.g. append withTags) and a few other minor nits and we are good to go.

Thank you so much for sticking with this; this is great.

Copy link
Contributor

@jmarantz jmarantz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thank you!

@mattklein123 I assume you'll want to take the next pass.

@mattklein123 mattklein123 self-assigned this Feb 26, 2020
@mattklein123 mattklein123 merged commit 64384d0 into envoyproxy:master Feb 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants