Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[deprecated] stats: do not canonicalize StatNames when storing in allocator, using a special syntax in tests for identifying dynamic stat names. #9774

Closed
wants to merge 20 commits into from

Conversation

jmarantz
Copy link
Contributor

@jmarantz jmarantz commented Jan 22, 2020

Description: Resolves a problem that was discovered when experimentally switching the default SymbolTable implementation from FakeSymbolTableImpl to RealSymbolTableImpl, and then running all tests. The problem is that there are now distinct StatName representations for the same string, and hash lookups using the StatName hash/equal functions will consider them distinct.

This manifested in two problems:

  1. There was an accidental canonicalization that was taking place in the stat allocation process, where a StatName was being rendered as a string and then re-encoded. This meant that the StatName passed into the allocator would not be equivalent to counter.statName() coming out of it. This behaved badly in maps, such as the one in IsolatedStatStore, and likely others. In this PR we resolve this problem by removing that serialize/re-parse phase during allocation. This should make things faster too.

  2. Tests often check stat values by looking up via string, and there's no easy way to figure out which parts of the string were supposed to be dynamic. The leanest and most robust way is to let the test encode this information in the string itself, and a helper class, MixedStatNames, was added to stat_test_utility.h to make this easy and terse. You just wrap the dynamic portions in backquotes. There are tests for this new class and its string hacking of course.

This PR as it stands flips the default-bit for SymbolTable construction to use real symbol tables, and we'll need to flip it back before submitting.

Risk Level: medium -- this re-enables dynamic symbol tables.
Testing: //test/...
Docs Changes: n/a
Release Notes: n/a
Fixes: #9768

Signed-off-by: Joshua Marantz <[email protected]>
Signed-off-by: Joshua Marantz <[email protected]>
Signed-off-by: Joshua Marantz <[email protected]>
Signed-off-by: Joshua Marantz <[email protected]>
…by string_view vs StatName, as they distract from this PR.

Signed-off-by: Joshua Marantz <[email protected]>
Signed-off-by: Joshua Marantz <[email protected]>
@jmarantz jmarantz changed the title WiP stats: do not canonicalize StatNames when storing in allocator. stats: do not canonicalize StatNames when storing in allocator. Jan 22, 2020
@jmarantz jmarantz marked this pull request as ready for review January 22, 2020 19:28
.value());
EXPECT_EQ(1U, cm_.thread_local_cluster_.cluster_.info_->stats_store_
.counter("alt_stat.zone.zone_name.to_az.upstream_rq_200")
.value());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: in the existing code, this same exact stanza is repeated twice. I am not sure if the intent was to test a different stat-name in the second stanza.

@@ -41,19 +42,19 @@ uint64_t StatName::dataSize() const {
#ifndef ENVOY_CONFIG_COVERAGE
void StatName::debugPrint() {
if (size_and_data_ == nullptr) {
ENVOY_LOG_MISC(info, "Null StatName");
std::cerr << "Null StatName" << std::endl;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to use std::cerr so that I can call this from the debugger and see the results without restarting the binary with a differing logging level.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: should this possibly be wrapped in a different LOG macro? Don't feel strongly about this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I leave a TODO here? the logging code is currently in flux from Jose's PR; no need to create a merge conflict.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure that's fine.

@jmarantz
Copy link
Contributor Author

/azp run envoy-linux

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@jmarantz
Copy link
Contributor Author

Added @snowp as this is touching code you were looking at in #9743

Signed-off-by: Joshua Marantz <[email protected]>
MetricImpl(StatName name, absl::string_view tag_extracted_name, const std::vector<Tag>& tags,
SymbolTable& symbol_table)
: MetricImpl(symbol_table.toString(name), tag_extracted_name, tags, symbol_table) {}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI this is where the bug was, because we lost the structure of the original StatName, and which components of it were dynamic, which matters for hash-tables.

~StatsIsolatedStoreImplTest() override {
pool_.clear();
EXPECT_EQ(0, store_.symbolTable().numSymbols());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason that this file had to be changed is that we had to make the store_ be a unique_ptr so it could be reset, to clear all the symbols from the symbol table. This only matters with real symbol tables.

@@ -921,39 +921,39 @@ TEST_F(StatsThreadLocalStoreTestNoFixture, MemoryWithoutTlsFakeSymbolTable) {
init(true);
TestUtil::MemoryTest memory_test;
TestUtil::forEachSampleStat(
1000, [this](absl::string_view name) { store_->counter(std::string(name)); });
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: the changes in this file reduce the scale of these stat memory checks, in order to avoid timeouts in tsan in CI.

Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me with some small questions/comments. Also needs a master merge. Thank you!

/wait

Comment on lines +208 to +209
static absl::string_view ltrim(absl::string_view source,
absl::string_view delims = WhitespaceChars);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: update doc comments here and below

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

source/common/stats/symbol_table_creator.cc Show resolved Hide resolved
@@ -41,19 +42,19 @@ uint64_t StatName::dataSize() const {
#ifndef ENVOY_CONFIG_COVERAGE
void StatName::debugPrint() {
if (size_and_data_ == nullptr) {
ENVOY_LOG_MISC(info, "Null StatName");
std::cerr << "Null StatName" << std::endl;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: should this possibly be wrapped in a different LOG macro? Don't feel strongly about this.

@@ -72,6 +74,45 @@ class MemoryTest {
const size_t memory_at_construction_;
};

// Helps tests construct StatName with symbolic and dynamic components.
class MixedStatNames {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main concern with this type of construct is whether test writers are going to understand that they need to do this. Is there any way that we could make this fail in a more obvious way if a test writer is not doing the right thing? Possibly by making the test stat stores use a slightly different interface? I'm not sure what the best option is.

Copy link
Contributor Author

@jmarantz jmarantz Jan 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will address the other comments a little later, but this is the most strategic question.

After this PR, a dev writing a unit-test that looks for a counter in a map, and has used a Dynamic stat name, will encounter a test failure, and will have to discover how to use this solution. More hints could definitely be dropped in the doc for StatNameDynamicPool and stats.md. That would clearly be in-scope for this PR.

Another alternative, which maybe I should've considered more carefully, is to add a string-based map to a test override for IsolatedStatsStore. A warning could be placed in IsolatedStatsStore doc to use that instead. The advantage of this approach is that no one has to go crazy with the backquotes :)

The disadvantage is that if someone writes an integration test for a subsystem that uses dynamic stats, using the production stats stores, they will still need to do something manual to make stat lookups by name work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WDYT of having a separate effort in parallel to migrate all tests away from the pattern of using counter("foo").value() and toward one of instantiating something a new class StatTestContext, which could build its own string->stat map on demand. Once the string-map is populated, no distinction is needed between dynamic and static counters.

There are O(1500) references to pull counters and gauges from test files, so it would be an annoying PR but it wouldn't be too bad.

Then we could remove the deprecated counter() and gauge() methods from the Stats interface.

WDYT also of doing that as a follow-up to this PR, which is needed to to fix a bug?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to your last comment, I think that is the right long term solution and we should just bite the bullet and do the cleanup. Otherwise I fear that this is going to confuse people in very hard to understand ways. Is there anything we can do further in this PR doc wise that we can do to help people until that is done?

/wait-any

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, will update doc for StatNameDynamicStorage and StatNameDynamicPool to warn about lookups for testing, and also stats.md. SG?

I'll also revert the change in symbol_table_creator.cc so that fakes remain the default for testing.

In parallel, in a PR branched from this one, I'll try to start the slog of changing all tests to reference a helper class for stat lookup by name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I have an alternative to this one in #9836 -- no backquotes needed. A test-specific structure is created that wraps a Store&, tries to keep string-maps lazily, and thus provides test-specific counter lookups that are independent of how the StatName was constructed.

I'm not happy with the new class name in that PR yet, and there's about 60+ more files that need to be converted over. But they don't need to be done all at once. I could also add a format-checker that whitelists the current usage until we can remove the calls entirely. One other complication to getting all that done is how to handle mocks.

@jmarantz
Copy link
Contributor Author

/retest

@repokitteh-read-only
Copy link

🔨 rebuilding ci/circleci: coverage (failed build)

🐱

Caused by: a #9774 (comment) was created by @jmarantz.

see: more, trace.

@jmarantz
Copy link
Contributor Author

/azp envoy-linux run

@azure-pipelines
Copy link

Command 'envoy-linux' is not supported by Azure Pipelines.

Supported commands
  • help:
    • Get descriptions, examples and documentation about supported commands
    • Example: help "command_name"
  • list:
    • List all pipelines for this repository using a comment.
    • Example: "list"
  • run:
    • Run all pipelines or specific pipelines for this repository using a comment. Use this command by itself to trigger all related pipelines, or specify specific pipelines to run.
    • Example: "run" or "run pipeline_name, pipeline_name, pipeline_name"
  • where:
    • Report back the Azure DevOps orgs that are related to this repository and org
    • Example: "where"

See additional documentation.

@jmarantz
Copy link
Contributor Author

/azp run envoy-linux

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

jmarantz added a commit to jmarantz/envoy that referenced this pull request Jan 27, 2020
Signed-off-by: Joshua Marantz <[email protected]>
@jmarantz
Copy link
Contributor Author

closing in favor of #9836

@jmarantz jmarantz closed this Jan 28, 2020
@jmarantz jmarantz changed the title stats: do not canonicalize StatNames when storing in allocator. [deprecated] stats: do not canonicalize StatNames when storing in allocator, using a special syntax in tests for identifying dynamic stat names. Jan 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

stats: new dynamic stat functionality has an issue with aliasing for StatNames, and its interaction with maps.
3 participants