-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[querier] Impl aggr label rewrite capability #127
Conversation
58aa7b3
to
23023b0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add unit tests as well as some metrics
cmd/thanos/query.go
Outdated
@@ -245,6 +245,8 @@ func registerQuery(app *extkingpin.App) { | |||
enforceTenancy := cmd.Flag("query.enforce-tenancy", "Enforce tenancy on Query APIs. Responses are returned only if the label value of the configured tenant-label-name and the value of the tenant header matches.").Default("false").Bool() | |||
tenantLabel := cmd.Flag("query.tenant-label-name", "Label name to use when enforcing tenancy (if --query.enforce-tenancy is enabled).").Default(tenancy.DefaultTenantLabel).String() | |||
|
|||
rewriteAggregationLabelTo := cmd.Flag("query.rewrite-aggregation-label-to", "Rewrite the aggregation label to the provided value for metric with name ending in standard aggregation suffixes.").Default("").String() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: the cmd flag name is a bit confusing as well as the helper message, might try something like query.aggregation-rewrite-filter
, also it is good to call out the default behavior in helper message. I am not very good at naming too, i'd suggest you can also seek review from @hczhu-db @yuchen-db who don't have sufficient content to be able to understand what this flag means by purely reading it.
4852ba2
to
cee296e
Compare
9f7370a
to
a396609
Compare
a396609
to
7666479
Compare
@yulong-db Can you test its performance impact on a large dev shard? I suppose it should be minimum but it's better to confirm it with data points. |
go.mod
Outdated
@@ -114,6 +114,7 @@ require ( | |||
require ( | |||
capnproto.org/go/capnp/v3 v3.0.1-alpha.2.0.20240830165715-46ccd63a72af | |||
github.com/cortexproject/promqlsmith v0.0.0-20240506042652-6cfdd9739a5e | |||
github.com/gobwas/glob v0.2.3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a few comments:
- glob is not efficient for just suffix matching, I'd prefer to simply split metric_name by ":" and match exact strings on the last part (suffix)
- if using glob, you can use golang naive one:
filepath.Match()
instead of importing additional mod
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done some benchtest,
using plain hasSuffix
joined together by OR seems to be fastest, followed by glob, followed by split.
Will go with the hasSuffix
route.
yu.long@HH9DQ02X5W suffix_bench % go test -bench=.
goos: darwin
goarch: arm64
pkg: suffix
cpu: Apple M3 Max
BenchmarkHasSuffixMethod-16 188687617 6.281 ns/op
BenchmarkHasSuffixLoop-16 170946948 7.045 ns/op
BenchmarkSplitAndCheck-16 35680746 33.71 ns/op
BenchmarkRegexCheck-16 17390138 69.83 ns/op
BenchmarkGlobCheck-16 59952289 20.18 ns/op
PASS
ok suffix 7.838s
yu.long@HH9DQ02X5W suffix_bench % go test -bench=.
goos: darwin
goarch: arm64
pkg: suffix
cpu: Apple M3 Max
BenchmarkHasSuffixMethod/example:aggr-16 303715452 3.809 ns/op
BenchmarkHasSuffixMethod/example:avg-16 195510680 6.093 ns/op
BenchmarkHasSuffixMethod/example:count-16 152105118 7.909 ns/op
BenchmarkHasSuffixMethod/example:sum-16 123796609 9.665 ns/op
BenchmarkHasSuffixMethod/example:min-16 100000000 11.60 ns/op
BenchmarkHasSuffixMethod/example:max-16 86561088 13.18 ns/op
BenchmarkHasSuffixMethod/example:unknown-16 93237375 12.66 ns/op
BenchmarkHasSuffixMethod/example-16 92503970 12.59 ns/op
BenchmarkHasSuffixMethod/:aggr-16 398362672 3.020 ns/op
BenchmarkHasSuffixMethod/aggr:-16 100000000 11.09 ns/op
BenchmarkHasSuffixMethod/example:aggr:extra-16 93556914 12.68 ns/op
BenchmarkHasSuffixLoop/example:aggr-16 271168821 4.413 ns/op
BenchmarkHasSuffixLoop/example:avg-16 162088345 7.367 ns/op
BenchmarkHasSuffixLoop/example:count-16 100000000 10.71 ns/op
BenchmarkHasSuffixLoop/example:sum-16 91479680 13.33 ns/op
BenchmarkHasSuffixLoop/example:min-16 70518752 16.37 ns/op
BenchmarkHasSuffixLoop/example:max-16 66025968 18.37 ns/op
BenchmarkHasSuffixLoop/example:unknown-16 60520857 19.95 ns/op
BenchmarkHasSuffixLoop/example-16 59855971 19.85 ns/op
BenchmarkHasSuffixLoop/:aggr-16 281107880 4.283 ns/op
BenchmarkHasSuffixLoop/aggr:-16 67457257 17.64 ns/op
BenchmarkHasSuffixLoop/example:aggr:extra-16 59150574 20.04 ns/op
BenchmarkSplitAndCheck/example:aggr-16 38315959 30.22 ns/op
BenchmarkSplitAndCheck/example:avg-16 39383161 30.45 ns/op
BenchmarkSplitAndCheck/example:count-16 36222586 31.61 ns/op
BenchmarkSplitAndCheck/example:sum-16 34211833 35.03 ns/op
BenchmarkSplitAndCheck/example:min-16 30136020 37.26 ns/op
BenchmarkSplitAndCheck/example:max-16 27334514 41.07 ns/op
BenchmarkSplitAndCheck/example:unknown-16 36580300 32.55 ns/op
BenchmarkSplitAndCheck/example-16 60206082 19.80 ns/op
BenchmarkSplitAndCheck/:aggr-16 42874752 27.78 ns/op
BenchmarkSplitAndCheck/aggr:-16 39029203 30.27 ns/op
BenchmarkSplitAndCheck/example:aggr:extra-16 33315325 34.88 ns/op
BenchmarkRegexCheck/example:aggr-16 20046482 60.38 ns/op
BenchmarkRegexCheck/example:avg-16 17586540 63.64 ns/op
BenchmarkRegexCheck/example:count-16 18019990 65.57 ns/op
BenchmarkRegexCheck/example:sum-16 17630983 67.52 ns/op
BenchmarkRegexCheck/example:min-16 15944292 75.79 ns/op
BenchmarkRegexCheck/example:max-16 14956338 80.25 ns/op
BenchmarkRegexCheck/example:unknown-16 20117041 59.68 ns/op
BenchmarkRegexCheck/example-16 51713364 21.79 ns/op
BenchmarkRegexCheck/:aggr-16 19919767 60.52 ns/op
BenchmarkRegexCheck/aggr:-16 19887936 60.44 ns/op
BenchmarkRegexCheck/example:aggr:extra-16 9326644 129.3 ns/op
BenchmarkGlobCheck/example:aggr-16 298770154 4.030 ns/op
BenchmarkGlobCheck/example:avg-16 61437639 19.55 ns/op
BenchmarkGlobCheck/example:count-16 59729350 19.45 ns/op
BenchmarkGlobCheck/example:sum-16 61394812 19.45 ns/op
BenchmarkGlobCheck/example:min-16 61882252 19.43 ns/op
BenchmarkGlobCheck/example:max-16 60936904 19.48 ns/op
BenchmarkGlobCheck/example:unknown-16 60923112 19.39 ns/op
BenchmarkGlobCheck/example-16 62624896 19.48 ns/op
BenchmarkGlobCheck/:aggr-16 294966518 4.037 ns/op
BenchmarkGlobCheck/aggr:-16 61968932 19.60 ns/op
BenchmarkGlobCheck/example:aggr:extra-16 61155585 19.52 ns/op
source code:
package main
import (
"regexp"
"strings"
"testing"
"github.com/gobwas/glob"
)
// 1. HasSuffix OR HasSuffix
func hasSuffixMethod(s string) bool {
return strings.HasSuffix(s, ":aggr") ||
strings.HasSuffix(s, ":avg") ||
strings.HasSuffix(s, ":count") ||
strings.HasSuffix(s, ":sum") ||
strings.HasSuffix(s, ":min") ||
strings.HasSuffix(s, ":max")
}
// 1.5 HasSuffix in a loop
func hasSuffixLoop(s string) bool {
for _, suffix := range []string{":aggr", ":avg", ":count", ":sum", ":min", ":max"} {
if strings.HasSuffix(s, suffix) {
return true
}
}
return false
}
// 2. Split by ":" and check in a set
var set = map[string]struct{}{
"aggr": {},
"avg": {},
"count": {},
"sum": {},
"min": {},
"max": {},
}
func splitAndCheck(s string) bool {
parts := strings.Split(s, ":")
if len(parts) != 2 {
return false
}
_, exists := set[parts[1]]
return exists
}
// 3. Regex Match
var regex = regexp.MustCompile(`:(aggr|avg|count|sum|min|max)$`)
func regexCheck(s string) bool {
return regex.MatchString(s)
}
// 4. Glob Pattern Matching
var globPattern = glob.MustCompile("{*:aggr, *:avg, *:count, *:sum, *:min, *:max}")
func globCheck(s string) bool {
return globPattern.Match(s)
}
// benchmark
func BenchmarkHasSuffixMethod(b *testing.B) {
for i := 0; i < b.N; i++ {
hasSuffixMethod("example:avg")
}
}
func BenchmarkHasSuffixLoop(b *testing.B) {
for i := 0; i < b.N; i++ {
hasSuffixLoop("example:avg")
}
}
func BenchmarkSplitAndCheck(b *testing.B) {
for i := 0; i < b.N; i++ {
splitAndCheck("example:avg")
}
}
func BenchmarkRegexCheck(b *testing.B) {
for i := 0; i < b.N; i++ {
regexCheck("example:avg")
}
}
func BenchmarkGlobCheck(b *testing.B) {
for i := 0; i < b.N; i++ {
globCheck("example:avg")
}
}
Help: "Total number of times aggregation-label-rewriter added the aggregation label", | ||
ConstLabels: prometheus.Labels{"new_value": desiredLabelValue}, | ||
}) | ||
m.aggregationLabelRewriterRuntimeSeconds = promauto.With(reg).NewHistogramVec(prometheus.HistogramOpts{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we define histogram buckets?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, It makes sense to use smaller buckets instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
awesome job
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
ty! will add dashboard panels in a followup pr |
deployed to dev-aws-us-east-1-obs-1 with
--query.aggregation-label-value-override=true
and--query.aggregation-label-value-override=v1
no resource impact observed ( spikes are the two timestamps when I restarted deployment)
Queries still work when
--query.aggregation-label-value-override=true
, which is expectedand queries will not return anything when
--query.aggregation-label-value-override=v1
, as expected, because the corresponding time series with label__rollup__=v1
is not created yet.emitted metrics are visible from monitoring prom: https://m3-aws-us-east-1-obs1-monitoring-prom.dev.databricks.com/graph?g0.expr=%7B__name__%3D~%22thanos_query_aggregation_label_rewriter.%2B%22%7D&g0.tab=1&g0.display_mode=lines&g0.show_exemplars=0&g0.range_input=1h
p999 extra runtime this operations adds to a query is lower than 5ms, since that's the smallest bucket. Maybe 1-2ms ish. so minimal performance impact.
Changes
Verification