-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
stats: drop dimension #5809
stats: drop dimension #5809
Conversation
Signed-off-by: Sugu Sougoumarane <[email protected]>
Also, the old code was badly written. Signed-off-by: Sugu Sougoumarane <[email protected]>
This feature implements the drop dimension part of vitessio#5791. The dropping of dimensions is not applicable to all stats vars. We drop dimensions for counters and timings, but not gauges. This also doesn't work for func based vars, but those are now unused for counters and timings. The feature doesn't actually drop the dimension. Instead, it consolidates all the different values into a single "all" category. This approach is more backward compatible and allows for going back and forth, because the label itself continues to exist in both scenarios. Signed-off-by: Sugu Sougoumarane <[email protected]>
queryCounts.Add(keys, queryCount) | ||
queryTimes.Add(keys, int64(duration)) | ||
queryRowCounts.Add(keys, rowCount) | ||
queryErrorCounts.Add(keys, errorCount) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's also worth noting in the PR this is a behavior change. Previously the stats would get forgotten whenever the query plan cache expired entries. Now we'll keep these per-table stats for the lifetime of the tablet server.
IMO this is a net positive change. We had some very confusing metrics results before when things would get purged from the plan cache and counters could go down, but this is still a change that might negatively affect some users that have lots of tables that cycle in and out of the plan cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, this is not what you think this is. This is still a fully backward compatible change.
You're probably thinking of TabletServer.QueryStats, which reports values under "Queries". That behavior is a bad user experience. We have to fix it.
var ( | ||
// Global stats vars. | ||
// TODO(sougou): unglobalize after componentizing TabletServer. | ||
queryCounts, queryTimes, queryRowCounts, queryErrorCounts *stats.CountersWithMultiLabels |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't these just stay as part of the QueryEngine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Putting them inside QueryEngine causes panics in tests when code tries to update these vars, which get initialized only for the first instance.
We can move these in once we componentize it, which is what the TODO is for.
Signed-off-by: Sugu Sougoumarane <[email protected]>
Signed-off-by: Sugu Sougoumarane <[email protected]>
It turns out that combining dimensions doesn't address all use cases. Gauges are also susceptible to cause huge /debug/vars. This additional feature allows you to drop specific variables from being exported. Ideally, they should also not be tracked. But we can iterate on this improvement later. Signed-off-by: Sugu Sougoumarane <[email protected]>
queryCounts.Add(keys, queryCount) | ||
queryTimes.Add(keys, int64(duration)) | ||
queryRowCounts.Add(keys, rowCount) | ||
queryErrorCounts.Add(keys, errorCount) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens to mysqlTime now? Do we no longer record it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mysqlTime was unused. It wasn't being reported anywhere.
go/stats/counters_test.go
Outdated
c4 := NewCountersWithMultiLabels("counter_combine_dim4", "help", []string{"a", "b", "c"}) | ||
c4.Add([]string{"c1", "c2", "c3"}, 1) | ||
assert.Equal(t, `{"all.c2.all": 1}`, c4.String()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain how this test works? What happens to c1 and c3?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added comment, and improved the test:
// Anything under "a" and "c" should get reported under a consolidated "all" value
// instead of the specific supplied values.
Signed-off-by: Sugu Sougoumarane <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This feature implements the drop dimension part of #5791.
The dropping of dimensions is not applicable to all stats vars.
We drop dimensions for counters and timings, but not gauges.
This also doesn't work for func based vars, but those are now
unused for counters and timings.
The feature doesn't actually drop the dimension. Instead, it
consolidates all the different values into a single "all"
category. This approach is more backward compatible and allows
for going back and forth, because the label itself continues
to exist in both scenarios.
The PR contains some prerequisite work: