-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[deprecated] stats: do not canonicalize StatNames when storing in allocator, using a special syntax in tests for identifying dynamic stat names. #9774
Conversation
Signed-off-by: Joshua Marantz <[email protected]>
Signed-off-by: Joshua Marantz <[email protected]>
Signed-off-by: Joshua Marantz <[email protected]>
Signed-off-by: Joshua Marantz <[email protected]>
Signed-off-by: Joshua Marantz <[email protected]>
…in args. Signed-off-by: Joshua Marantz <[email protected]>
…by string_view vs StatName, as they distract from this PR. Signed-off-by: Joshua Marantz <[email protected]>
Signed-off-by: Joshua Marantz <[email protected]>
Signed-off-by: Joshua Marantz <[email protected]>
…ism. Signed-off-by: Joshua Marantz <[email protected]>
Signed-off-by: Joshua Marantz <[email protected]>
Signed-off-by: Joshua Marantz <[email protected]>
…l tables. Signed-off-by: Joshua Marantz <[email protected]>
Signed-off-by: Joshua Marantz <[email protected]>
.value()); | ||
EXPECT_EQ(1U, cm_.thread_local_cluster_.cluster_.info_->stats_store_ | ||
.counter("alt_stat.zone.zone_name.to_az.upstream_rq_200") | ||
.value()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: in the existing code, this same exact stanza is repeated twice. I am not sure if the intent was to test a different stat-name in the second stanza.
@@ -41,19 +42,19 @@ uint64_t StatName::dataSize() const { | |||
#ifndef ENVOY_CONFIG_COVERAGE | |||
void StatName::debugPrint() { | |||
if (size_and_data_ == nullptr) { | |||
ENVOY_LOG_MISC(info, "Null StatName"); | |||
std::cerr << "Null StatName" << std::endl; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer to use std::cerr so that I can call this from the debugger and see the results without restarting the binary with a differing logging level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: should this possibly be wrapped in a different LOG macro? Don't feel strongly about this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can I leave a TODO here? the logging code is currently in flux from Jose's PR; no need to create a merge conflict.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure that's fine.
/azp run envoy-linux |
Azure Pipelines successfully started running 1 pipeline(s). |
Signed-off-by: Joshua Marantz <[email protected]>
Signed-off-by: Joshua Marantz <[email protected]>
MetricImpl(StatName name, absl::string_view tag_extracted_name, const std::vector<Tag>& tags, | ||
SymbolTable& symbol_table) | ||
: MetricImpl(symbol_table.toString(name), tag_extracted_name, tags, symbol_table) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI this is where the bug was, because we lost the structure of the original StatName, and which components of it were dynamic, which matters for hash-tables.
~StatsIsolatedStoreImplTest() override { | ||
pool_.clear(); | ||
EXPECT_EQ(0, store_.symbolTable().numSymbols()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason that this file had to be changed is that we had to make the store_ be a unique_ptr so it could be reset, to clear all the symbols from the symbol table. This only matters with real symbol tables.
…kly in tsan. Signed-off-by: Joshua Marantz <[email protected]>
@@ -921,39 +921,39 @@ TEST_F(StatsThreadLocalStoreTestNoFixture, MemoryWithoutTlsFakeSymbolTable) { | |||
init(true); | |||
TestUtil::MemoryTest memory_test; | |||
TestUtil::forEachSampleStat( | |||
1000, [this](absl::string_view name) { store_->counter(std::string(name)); }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: the changes in this file reduce the scale of these stat memory checks, in order to avoid timeouts in tsan in CI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me with some small questions/comments. Also needs a master merge. Thank you!
/wait
static absl::string_view ltrim(absl::string_view source, | ||
absl::string_view delims = WhitespaceChars); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: update doc comments here and below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -41,19 +42,19 @@ uint64_t StatName::dataSize() const { | |||
#ifndef ENVOY_CONFIG_COVERAGE | |||
void StatName::debugPrint() { | |||
if (size_and_data_ == nullptr) { | |||
ENVOY_LOG_MISC(info, "Null StatName"); | |||
std::cerr << "Null StatName" << std::endl; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: should this possibly be wrapped in a different LOG macro? Don't feel strongly about this.
@@ -72,6 +74,45 @@ class MemoryTest { | |||
const size_t memory_at_construction_; | |||
}; | |||
|
|||
// Helps tests construct StatName with symbolic and dynamic components. | |||
class MixedStatNames { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My main concern with this type of construct is whether test writers are going to understand that they need to do this. Is there any way that we could make this fail in a more obvious way if a test writer is not doing the right thing? Possibly by making the test stat stores use a slightly different interface? I'm not sure what the best option is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will address the other comments a little later, but this is the most strategic question.
After this PR, a dev writing a unit-test that looks for a counter in a map, and has used a Dynamic stat name, will encounter a test failure, and will have to discover how to use this solution. More hints could definitely be dropped in the doc for StatNameDynamicPool and stats.md. That would clearly be in-scope for this PR.
Another alternative, which maybe I should've considered more carefully, is to add a string-based map to a test override for IsolatedStatsStore. A warning could be placed in IsolatedStatsStore doc to use that instead. The advantage of this approach is that no one has to go crazy with the backquotes :)
The disadvantage is that if someone writes an integration test for a subsystem that uses dynamic stats, using the production stats stores, they will still need to do something manual to make stat lookups by name work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WDYT of having a separate effort in parallel to migrate all tests away from the pattern of using counter("foo").value() and toward one of instantiating something a new class StatTestContext, which could build its own string->stat map on demand. Once the string-map is populated, no distinction is needed between dynamic and static counters.
There are O(1500) references to pull counters and gauges from test files, so it would be an annoying PR but it wouldn't be too bad.
Then we could remove the deprecated counter() and gauge() methods from the Stats interface.
WDYT also of doing that as a follow-up to this PR, which is needed to to fix a bug?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to your last comment, I think that is the right long term solution and we should just bite the bullet and do the cleanup. Otherwise I fear that this is going to confuse people in very hard to understand ways. Is there anything we can do further in this PR doc wise that we can do to help people until that is done?
/wait-any
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, will update doc for StatNameDynamicStorage and StatNameDynamicPool to warn about lookups for testing, and also stats.md. SG?
I'll also revert the change in symbol_table_creator.cc so that fakes remain the default for testing.
In parallel, in a PR branched from this one, I'll try to start the slog of changing all tests to reference a helper class for stat lookup by name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK I have an alternative to this one in #9836 -- no backquotes needed. A test-specific structure is created that wraps a Store&, tries to keep string-maps lazily, and thus provides test-specific counter lookups that are independent of how the StatName was constructed.
I'm not happy with the new class name in that PR yet, and there's about 60+ more files that need to be converted over. But they don't need to be done all at once. I could also add a format-checker that whitelists the current usage until we can remove the calls entirely. One other complication to getting all that done is how to handle mocks.
Signed-off-by: Joshua Marantz <[email protected]>
Signed-off-by: Joshua Marantz <[email protected]>
/retest |
🔨 rebuilding |
Signed-off-by: Joshua Marantz <[email protected]>
/azp envoy-linux run |
Command 'envoy-linux' is not supported by Azure Pipelines. Supported commands
See additional documentation. |
/azp run envoy-linux |
Azure Pipelines successfully started running 1 pipeline(s). |
Signed-off-by: Joshua Marantz <[email protected]>
closing in favor of #9836 |
Description: Resolves a problem that was discovered when experimentally switching the default SymbolTable implementation from FakeSymbolTableImpl to RealSymbolTableImpl, and then running all tests. The problem is that there are now distinct StatName representations for the same string, and hash lookups using the StatName hash/equal functions will consider them distinct.
This manifested in two problems:
There was an accidental canonicalization that was taking place in the stat allocation process, where a StatName was being rendered as a string and then re-encoded. This meant that the StatName passed into the allocator would not be equivalent to counter.statName() coming out of it. This behaved badly in maps, such as the one in IsolatedStatStore, and likely others. In this PR we resolve this problem by removing that serialize/re-parse phase during allocation. This should make things faster too.
Tests often check stat values by looking up via string, and there's no easy way to figure out which parts of the string were supposed to be dynamic. The leanest and most robust way is to let the test encode this information in the string itself, and a helper class, MixedStatNames, was added to stat_test_utility.h to make this easy and terse. You just wrap the dynamic portions in backquotes. There are tests for this new class and its string hacking of course.
This PR as it stands flips the default-bit for SymbolTable construction to use real symbol tables, and we'll need to flip it back before submitting.
Risk Level: medium -- this re-enables dynamic symbol tables.
Testing: //test/...
Docs Changes: n/a
Release Notes: n/a
Fixes: #9768