[deprecated] stats: do not canonicalize StatNames when storing in allocator, using a special syntax in tests for identifying dynamic stat names. #9774

jmarantz · 2020-01-22T14:27:03Z

Description: Resolves a problem that was discovered when experimentally switching the default SymbolTable implementation from FakeSymbolTableImpl to RealSymbolTableImpl, and then running all tests. The problem is that there are now distinct StatName representations for the same string, and hash lookups using the StatName hash/equal functions will consider them distinct.

This manifested in two problems:

There was an accidental canonicalization that was taking place in the stat allocation process, where a StatName was being rendered as a string and then re-encoded. This meant that the StatName passed into the allocator would not be equivalent to counter.statName() coming out of it. This behaved badly in maps, such as the one in IsolatedStatStore, and likely others. In this PR we resolve this problem by removing that serialize/re-parse phase during allocation. This should make things faster too.
Tests often check stat values by looking up via string, and there's no easy way to figure out which parts of the string were supposed to be dynamic. The leanest and most robust way is to let the test encode this information in the string itself, and a helper class, MixedStatNames, was added to stat_test_utility.h to make this easy and terse. You just wrap the dynamic portions in backquotes. There are tests for this new class and its string hacking of course.

This PR as it stands flips the default-bit for SymbolTable construction to use real symbol tables, and we'll need to flip it back before submitting.

Risk Level: medium -- this re-enables dynamic symbol tables.
Testing: //test/...
Docs Changes: n/a
Release Notes: n/a
Fixes: #9768

Signed-off-by: Joshua Marantz <[email protected]>

…in args. Signed-off-by: Joshua Marantz <[email protected]>

…by string_view vs StatName, as they distract from this PR. Signed-off-by: Joshua Marantz <[email protected]>

Signed-off-by: Joshua Marantz <[email protected]>

…ism. Signed-off-by: Joshua Marantz <[email protected]>

Signed-off-by: Joshua Marantz <[email protected]>

…l tables. Signed-off-by: Joshua Marantz <[email protected]>

Signed-off-by: Joshua Marantz <[email protected]>

jmarantz · 2020-01-22T15:00:43Z

test/common/router/router_test.cc

-                    .value());
-  EXPECT_EQ(1U, cm_.thread_local_cluster_.cluster_.info_->stats_store_
-                    .counter("alt_stat.zone.zone_name.to_az.upstream_rq_200")
-                    .value());


note: in the existing code, this same exact stanza is repeated twice. I am not sure if the intent was to test a different stat-name in the second stanza.

jmarantz · 2020-01-22T19:30:13Z

source/common/stats/symbol_table_impl.cc

@@ -41,19 +42,19 @@ uint64_t StatName::dataSize() const {
 #ifndef ENVOY_CONFIG_COVERAGE
 void StatName::debugPrint() {
  if (size_and_data_ == nullptr) {
-    ENVOY_LOG_MISC(info, "Null StatName");
+    std::cerr << "Null StatName" << std::endl;


I prefer to use std::cerr so that I can call this from the debugger and see the results without restarting the binary with a differing logging level.

nit: should this possibly be wrapped in a different LOG macro? Don't feel strongly about this.

Can I leave a TODO here? the logging code is currently in flux from Jose's PR; no need to create a merge conflict.

Sure that's fine.

source/common/stats/symbol_table_creator.cc

jmarantz · 2020-01-22T20:34:52Z

/azp run envoy-linux

azure-pipelines · 2020-01-22T20:35:04Z

Azure Pipelines successfully started running 1 pipeline(s).

jmarantz · 2020-01-22T20:49:54Z

Added @snowp as this is touching code you were looking at in #9743

Signed-off-by: Joshua Marantz <[email protected]>

jmarantz · 2020-01-23T01:27:14Z

source/common/stats/metric_impl.h

  MetricImpl(StatName name, absl::string_view tag_extracted_name, const std::vector<Tag>& tags,
             SymbolTable& symbol_table)
-      : MetricImpl(symbol_table.toString(name), tag_extracted_name, tags, symbol_table) {}


FYI this is where the bug was, because we lost the structure of the original StatName, and which components of it were dynamic, which matters for hash-tables.

jmarantz · 2020-01-23T01:31:28Z

test/common/stats/isolated_store_impl_test.cc

  ~StatsIsolatedStoreImplTest() override {
    pool_.clear();
-    EXPECT_EQ(0, store_.symbolTable().numSymbols());


The reason that this file had to be changed is that we had to make the store_ be a unique_ptr so it could be reset, to clear all the symbols from the symbol table. This only matters with real symbol tables.

…kly in tsan. Signed-off-by: Joshua Marantz <[email protected]>

jmarantz · 2020-01-23T15:19:00Z

test/common/stats/thread_local_store_test.cc

@@ -921,39 +921,39 @@ TEST_F(StatsThreadLocalStoreTestNoFixture, MemoryWithoutTlsFakeSymbolTable) {
  init(true);
  TestUtil::MemoryTest memory_test;
  TestUtil::forEachSampleStat(
-      1000, [this](absl::string_view name) { store_->counter(std::string(name)); });


note: the changes in this file reduce the scale of these stat memory checks, in order to avoid timeouts in tsan in CI.

mattklein123

Makes sense to me with some small questions/comments. Also needs a master merge. Thank you!

/wait

mattklein123 · 2020-01-23T17:58:28Z

source/common/common/utility.h

+  static absl::string_view ltrim(absl::string_view source,
+                                 absl::string_view delims = WhitespaceChars);


nit: update doc comments here and below

source/common/stats/symbol_table_creator.cc

mattklein123 · 2020-01-23T18:02:17Z

source/common/stats/symbol_table_impl.cc

@@ -41,19 +42,19 @@ uint64_t StatName::dataSize() const {
 #ifndef ENVOY_CONFIG_COVERAGE
 void StatName::debugPrint() {
  if (size_and_data_ == nullptr) {
-    ENVOY_LOG_MISC(info, "Null StatName");
+    std::cerr << "Null StatName" << std::endl;


nit: should this possibly be wrapped in a different LOG macro? Don't feel strongly about this.

mattklein123 · 2020-01-23T18:06:34Z

test/common/stats/stat_test_utility.h

@@ -72,6 +74,45 @@ class MemoryTest {
  const size_t memory_at_construction_;
 };

+// Helps tests construct StatName with symbolic and dynamic components.
+class MixedStatNames {


My main concern with this type of construct is whether test writers are going to understand that they need to do this. Is there any way that we could make this fail in a more obvious way if a test writer is not doing the right thing? Possibly by making the test stat stores use a slightly different interface? I'm not sure what the best option is.

I will address the other comments a little later, but this is the most strategic question.

After this PR, a dev writing a unit-test that looks for a counter in a map, and has used a Dynamic stat name, will encounter a test failure, and will have to discover how to use this solution. More hints could definitely be dropped in the doc for StatNameDynamicPool and stats.md. That would clearly be in-scope for this PR.

Another alternative, which maybe I should've considered more carefully, is to add a string-based map to a test override for IsolatedStatsStore. A warning could be placed in IsolatedStatsStore doc to use that instead. The advantage of this approach is that no one has to go crazy with the backquotes :)

The disadvantage is that if someone writes an integration test for a subsystem that uses dynamic stats, using the production stats stores, they will still need to do something manual to make stat lookups by name work.

WDYT of having a separate effort in parallel to migrate all tests away from the pattern of using counter("foo").value() and toward one of instantiating something a new class StatTestContext, which could build its own string->stat map on demand. Once the string-map is populated, no distinction is needed between dynamic and static counters.

There are O(1500) references to pull counters and gauges from test files, so it would be an annoying PR but it wouldn't be too bad.

Then we could remove the deprecated counter() and gauge() methods from the Stats interface.

WDYT also of doing that as a follow-up to this PR, which is needed to to fix a bug?

+1 to your last comment, I think that is the right long term solution and we should just bite the bullet and do the cleanup. Otherwise I fear that this is going to confuse people in very hard to understand ways. Is there anything we can do further in this PR doc wise that we can do to help people until that is done?

/wait-any

Yes, will update doc for StatNameDynamicStorage and StatNameDynamicPool to warn about lookups for testing, and also stats.md. SG?

I'll also revert the change in symbol_table_creator.cc so that fakes remain the default for testing.

In parallel, in a PR branched from this one, I'll try to start the slog of changing all tests to reference a helper class for stat lookup by name.

OK I have an alternative to this one in #9836 -- no backquotes needed. A test-specific structure is created that wraps a Store&, tries to keep string-maps lazily, and thus provides test-specific counter lookups that are independent of how the StatName was constructed.

I'm not happy with the new class name in that PR yet, and there's about 60+ more files that need to be converted over. But they don't need to be done all at once. I could also add a format-checker that whitelists the current usage until we can remove the calls entirely. One other complication to getting all that done is how to handle mocks.

Signed-off-by: Joshua Marantz <[email protected]>

jmarantz · 2020-01-24T13:48:33Z

/retest

repokitteh-read-only · 2020-01-24T13:48:38Z

🔨 rebuilding ci/circleci: coverage (failed build)

🐱

Caused by: a #9774 (comment) was created by @jmarantz.

see: more, trace.

Signed-off-by: Joshua Marantz <[email protected]>

jmarantz · 2020-01-24T21:09:39Z

/azp envoy-linux run

azure-pipelines · 2020-01-24T21:09:44Z

Command 'envoy-linux' is not supported by Azure Pipelines. Supported commands help: Get descriptions, examples and documentation about supported commands Example: help "command_name" list: List all pipelines for this repository using a comment. Example: "list" run: Run all pipelines or specific pipelines for this repository using a comment. Use this command by itself to trigger all related pipelines, or specify specific pipelines to run. Example: "run" or "run pipeline_name, pipeline_name, pipeline_name" where: Report back the Azure DevOps orgs that are related to this repository and org Example: "where" See additional documentation.

jmarantz · 2020-01-24T21:10:04Z

/azp run envoy-linux

azure-pipelines · 2020-01-24T21:10:13Z

Azure Pipelines successfully started running 1 pipeline(s).

Signed-off-by: Joshua Marantz <[email protected]>

jmarantz · 2020-01-28T02:23:22Z

closing in favor of #9836

jmarantz added 10 commits January 21, 2020 20:15

retain the StatName in original form when storing

dc52bf3

Signed-off-by: Joshua Marantz <[email protected]>

fix 3 tests

f404467

Signed-off-by: Joshua Marantz <[email protected]>

all tests working.

4ff0cc2

Signed-off-by: Joshua Marantz <[email protected]>

format

e0a2621

Signed-off-by: Joshua Marantz <[email protected]>

add comments and TODOs.

15b4c4d

Signed-off-by: Joshua Marantz <[email protected]>

Move further down the path of using StatName for tag_extracted_names …

cf20b1b

…in args. Signed-off-by: Joshua Marantz <[email protected]>

Revert most the interface chnanges around passing tag_extracted_name …

33487ad

…by string_view vs StatName, as they distract from this PR. Signed-off-by: Joshua Marantz <[email protected]>

remove dead code.

2e12464

Signed-off-by: Joshua Marantz <[email protected]>

convert fault_filter_test to use the new mechanism to inject dynamics.

6b14938

Signed-off-by: Joshua Marantz <[email protected]>

update router_test and proxy_test to the new dynamic injection mechan…

bf85244

…ism. Signed-off-by: Joshua Marantz <[email protected]>

jmarantz mentioned this pull request Jan 22, 2020

stats: disable real symbol tables till #9798 is resolved #9770

Merged

jmarantz added 4 commits January 22, 2020 14:04

add more targeted testing & cleanup.

c96d934

Signed-off-by: Joshua Marantz <[email protected]>

Merge branch 'master' into dynamic-stats-fix1

2a2bc25

Signed-off-by: Joshua Marantz <[email protected]>

include a rollback for envoyproxy#9770 to re-enable use of real symbo…

7f034e2

…l tables. Signed-off-by: Joshua Marantz <[email protected]>

renames and formatting

71add0f

Signed-off-by: Joshua Marantz <[email protected]>

jmarantz changed the title ~~WiP stats: do not canonicalize StatNames when storing in allocator.~~ stats: do not canonicalize StatNames when storing in allocator. Jan 22, 2020

jmarantz marked this pull request as ready for review January 22, 2020 19:28

jmarantz requested review from alyssawilk and zuercher as code owners January 22, 2020 19:28

jmarantz commented Jan 22, 2020

View reviewed changes

jmarantz assigned mattklein123 Jan 22, 2020

jmarantz commented Jan 22, 2020

View reviewed changes

source/common/stats/symbol_table_creator.cc Show resolved Hide resolved

jmarantz assigned snowp Jan 22, 2020

jmarantz mentioned this pull request Jan 22, 2020

vhds: tsan failure in test/integration/vhds_integration_test #9784

Closed

jmarantz added 2 commits January 22, 2020 20:15

minor cleanups

e28e81a

Signed-off-by: Joshua Marantz <[email protected]>

Merge branch 'master' into dynamic-stats-fix1

3ca7717

Signed-off-by: Joshua Marantz <[email protected]>

jmarantz commented Jan 23, 2020

View reviewed changes

reduce cluster-counts in unit tests to allow them to finish more quic…

ff93c40

…kly in tsan. Signed-off-by: Joshua Marantz <[email protected]>

jmarantz commented Jan 23, 2020

View reviewed changes

mattklein123 requested changes Jan 23, 2020

View reviewed changes

repokitteh-read-only bot added the waiting label Jan 23, 2020

jmarantz added 2 commits January 23, 2020 14:27

Merge branch 'master' into dynamic-stats-fix1

ab0edd5

Signed-off-by: Joshua Marantz <[email protected]>

Add TODO to add logging macros, and fix comments for ltrim/rtrim/trim

b0e0c32

Signed-off-by: Joshua Marantz <[email protected]>

repokitteh-read-only bot removed the waiting label Jan 24, 2020

Merge branch 'master' into dynamic-stats-fix1

c9129c3

Signed-off-by: Joshua Marantz <[email protected]>

repokitteh-read-only bot added waiting:any and removed waiting:any labels Jan 24, 2020

jmarantz added a commit to jmarantz/envoy that referenced this pull request Jan 27, 2020

clone of envoyproxy#9774

b4f60f4

Signed-off-by: Joshua Marantz <[email protected]>

jmarantz mentioned this pull request Jan 27, 2020

stats: do not canonicalize StatNames when storing in allocator #9836

Merged

mattklein123 added the waiting label Jan 27, 2020

jmarantz closed this Jan 28, 2020

jmarantz changed the title ~~stats: do not canonicalize StatNames when storing in allocator.~~ [deprecated] stats: do not canonicalize StatNames when storing in allocator, using a special syntax in tests for identifying dynamic stat names. Jan 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[deprecated] stats: do not canonicalize StatNames when storing in allocator, using a special syntax in tests for identifying dynamic stat names. #9774

[deprecated] stats: do not canonicalize StatNames when storing in allocator, using a special syntax in tests for identifying dynamic stat names. #9774

jmarantz commented Jan 22, 2020 •

edited

Loading

jmarantz Jan 22, 2020

jmarantz Jan 22, 2020

mattklein123 Jan 23, 2020

jmarantz Jan 24, 2020

mattklein123 Jan 24, 2020

jmarantz commented Jan 22, 2020

azure-pipelines bot commented Jan 22, 2020

jmarantz commented Jan 22, 2020

jmarantz Jan 23, 2020

jmarantz Jan 23, 2020

jmarantz Jan 23, 2020

mattklein123 left a comment

mattklein123 Jan 23, 2020

jmarantz Jan 24, 2020

mattklein123 Jan 23, 2020

mattklein123 Jan 23, 2020

jmarantz Jan 23, 2020 •

edited

Loading

jmarantz Jan 24, 2020

mattklein123 Jan 24, 2020

jmarantz Jan 24, 2020

jmarantz Jan 27, 2020

jmarantz commented Jan 24, 2020

repokitteh-read-only bot commented Jan 24, 2020

jmarantz commented Jan 24, 2020

azure-pipelines bot commented Jan 24, 2020

jmarantz commented Jan 24, 2020

azure-pipelines bot commented Jan 24, 2020

jmarantz commented Jan 28, 2020

		static absl::string_view ltrim(absl::string_view source,
		absl::string_view delims = WhitespaceChars);

[deprecated] stats: do not canonicalize StatNames when storing in allocator, using a special syntax in tests for identifying dynamic stat names. #9774

[deprecated] stats: do not canonicalize StatNames when storing in allocator, using a special syntax in tests for identifying dynamic stat names. #9774

Conversation

jmarantz commented Jan 22, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jmarantz commented Jan 22, 2020

azure-pipelines bot commented Jan 22, 2020

jmarantz commented Jan 22, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mattklein123 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jmarantz Jan 23, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jmarantz commented Jan 24, 2020

repokitteh-read-only bot commented Jan 24, 2020

jmarantz commented Jan 24, 2020

azure-pipelines bot commented Jan 24, 2020

jmarantz commented Jan 24, 2020

azure-pipelines bot commented Jan 24, 2020

jmarantz commented Jan 28, 2020

jmarantz commented Jan 22, 2020 •

edited

Loading

jmarantz Jan 23, 2020 •

edited

Loading