[querier] Impl aggr label rewrite capability #127

yulong-db · 2025-02-13T21:00:15Z

deployed to dev-aws-us-east-1-obs-1 with --query.aggregation-label-value-override=true and --query.aggregation-label-value-override=v1

no resource impact observed ( spikes are the two timestamps when I restarted deployment)

Queries still work when --query.aggregation-label-value-override=true, which is expected
and queries will not return anything when --query.aggregation-label-value-override=v1, as expected, because the corresponding time series with label __rollup__=v1 is not created yet.

emitted metrics are visible from monitoring prom: https://m3-aws-us-east-1-obs1-monitoring-prom.dev.databricks.com/graph?g0.expr=%7B__name__%3D~%22thanos_query_aggregation_label_rewriter.%2B%22%7D&g0.tab=1&g0.display_mode=lines&g0.show_exemplars=0&g0.range_input=1h

p999 extra runtime this operations adds to a query is lower than 5ms, since that's the smallest bucket. Maybe 1-2ms ish. so minimal performance impact.

I added CHANGELOG entry for this change.
Change is not relevant to the end user.

Changes

Verification

jnyi

please add unit tests as well as some metrics

jnyi · 2025-02-14T18:15:05Z

cmd/thanos/query.go

@@ -245,6 +245,8 @@ func registerQuery(app *extkingpin.App) {
 	enforceTenancy := cmd.Flag("query.enforce-tenancy", "Enforce tenancy on Query APIs. Responses are returned only if the label value of the configured tenant-label-name and the value of the tenant header matches.").Default("false").Bool()
 	tenantLabel := cmd.Flag("query.tenant-label-name", "Label name to use when enforcing tenancy (if --query.enforce-tenancy is enabled).").Default(tenancy.DefaultTenantLabel).String()

+	rewriteAggregationLabelTo := cmd.Flag("query.rewrite-aggregation-label-to", "Rewrite the aggregation label to the provided value for metric with name ending in standard aggregation suffixes.").Default("").String()


nit: the cmd flag name is a bit confusing as well as the helper message, might try something like query.aggregation-rewrite-filter, also it is good to call out the default behavior in helper message. I am not very good at naming too, i'd suggest you can also seek review from @hczhu-db @yuchen-db who don't have sufficient content to be able to understand what this flag means by purely reading it.

pkg/query/querier.go

pkg/query/aggregation_label_rewriter.go

hczhu-db · 2025-02-15T01:29:48Z

@yulong-db Can you test its performance impact on a large dev shard? I suppose it should be minimum but it's better to confirm it with data points.

jnyi · 2025-02-18T21:36:08Z

go.mod

@@ -114,6 +114,7 @@ require (
 require (
 	capnproto.org/go/capnp/v3 v3.0.1-alpha.2.0.20240830165715-46ccd63a72af
 	github.com/cortexproject/promqlsmith v0.0.0-20240506042652-6cfdd9739a5e
+	github.com/gobwas/glob v0.2.3


a few comments:

glob is not efficient for just suffix matching, I'd prefer to simply split metric_name by ":" and match exact strings on the last part (suffix)

if using glob, you can use golang naive one: filepath.Match() instead of importing additional mod

done some benchtest,

using plain hasSuffix joined together by OR seems to be fastest, followed by glob, followed by split.

Will go with the hasSuffix route.

yu.long@HH9DQ02X5W suffix_bench % go test -bench=. goos: darwin goarch: arm64 pkg: suffix cpu: Apple M3 Max BenchmarkHasSuffixMethod-16 188687617 6.281 ns/op BenchmarkHasSuffixLoop-16 170946948 7.045 ns/op BenchmarkSplitAndCheck-16 35680746 33.71 ns/op BenchmarkRegexCheck-16 17390138 69.83 ns/op BenchmarkGlobCheck-16 59952289 20.18 ns/op PASS ok suffix 7.838s yu.long@HH9DQ02X5W suffix_bench % go test -bench=. goos: darwin goarch: arm64 pkg: suffix cpu: Apple M3 Max BenchmarkHasSuffixMethod/example:aggr-16 303715452 3.809 ns/op BenchmarkHasSuffixMethod/example:avg-16 195510680 6.093 ns/op BenchmarkHasSuffixMethod/example:count-16 152105118 7.909 ns/op BenchmarkHasSuffixMethod/example:sum-16 123796609 9.665 ns/op BenchmarkHasSuffixMethod/example:min-16 100000000 11.60 ns/op BenchmarkHasSuffixMethod/example:max-16 86561088 13.18 ns/op BenchmarkHasSuffixMethod/example:unknown-16 93237375 12.66 ns/op BenchmarkHasSuffixMethod/example-16 92503970 12.59 ns/op BenchmarkHasSuffixMethod/:aggr-16 398362672 3.020 ns/op BenchmarkHasSuffixMethod/aggr:-16 100000000 11.09 ns/op BenchmarkHasSuffixMethod/example:aggr:extra-16 93556914 12.68 ns/op BenchmarkHasSuffixLoop/example:aggr-16 271168821 4.413 ns/op BenchmarkHasSuffixLoop/example:avg-16 162088345 7.367 ns/op BenchmarkHasSuffixLoop/example:count-16 100000000 10.71 ns/op BenchmarkHasSuffixLoop/example:sum-16 91479680 13.33 ns/op BenchmarkHasSuffixLoop/example:min-16 70518752 16.37 ns/op BenchmarkHasSuffixLoop/example:max-16 66025968 18.37 ns/op BenchmarkHasSuffixLoop/example:unknown-16 60520857 19.95 ns/op BenchmarkHasSuffixLoop/example-16 59855971 19.85 ns/op BenchmarkHasSuffixLoop/:aggr-16 281107880 4.283 ns/op BenchmarkHasSuffixLoop/aggr:-16 67457257 17.64 ns/op BenchmarkHasSuffixLoop/example:aggr:extra-16 59150574 20.04 ns/op BenchmarkSplitAndCheck/example:aggr-16 38315959 30.22 ns/op BenchmarkSplitAndCheck/example:avg-16 39383161 30.45 ns/op BenchmarkSplitAndCheck/example:count-16 36222586 31.61 ns/op BenchmarkSplitAndCheck/example:sum-16 34211833 35.03 ns/op BenchmarkSplitAndCheck/example:min-16 30136020 37.26 ns/op BenchmarkSplitAndCheck/example:max-16 27334514 41.07 ns/op BenchmarkSplitAndCheck/example:unknown-16 36580300 32.55 ns/op BenchmarkSplitAndCheck/example-16 60206082 19.80 ns/op BenchmarkSplitAndCheck/:aggr-16 42874752 27.78 ns/op BenchmarkSplitAndCheck/aggr:-16 39029203 30.27 ns/op BenchmarkSplitAndCheck/example:aggr:extra-16 33315325 34.88 ns/op BenchmarkRegexCheck/example:aggr-16 20046482 60.38 ns/op BenchmarkRegexCheck/example:avg-16 17586540 63.64 ns/op BenchmarkRegexCheck/example:count-16 18019990 65.57 ns/op BenchmarkRegexCheck/example:sum-16 17630983 67.52 ns/op BenchmarkRegexCheck/example:min-16 15944292 75.79 ns/op BenchmarkRegexCheck/example:max-16 14956338 80.25 ns/op BenchmarkRegexCheck/example:unknown-16 20117041 59.68 ns/op BenchmarkRegexCheck/example-16 51713364 21.79 ns/op BenchmarkRegexCheck/:aggr-16 19919767 60.52 ns/op BenchmarkRegexCheck/aggr:-16 19887936 60.44 ns/op BenchmarkRegexCheck/example:aggr:extra-16 9326644 129.3 ns/op BenchmarkGlobCheck/example:aggr-16 298770154 4.030 ns/op BenchmarkGlobCheck/example:avg-16 61437639 19.55 ns/op BenchmarkGlobCheck/example:count-16 59729350 19.45 ns/op BenchmarkGlobCheck/example:sum-16 61394812 19.45 ns/op BenchmarkGlobCheck/example:min-16 61882252 19.43 ns/op BenchmarkGlobCheck/example:max-16 60936904 19.48 ns/op BenchmarkGlobCheck/example:unknown-16 60923112 19.39 ns/op BenchmarkGlobCheck/example-16 62624896 19.48 ns/op BenchmarkGlobCheck/:aggr-16 294966518 4.037 ns/op BenchmarkGlobCheck/aggr:-16 61968932 19.60 ns/op BenchmarkGlobCheck/example:aggr:extra-16 61155585 19.52 ns/op

source code:

package main import ( "regexp" "strings" "testing" "github.com/gobwas/glob" ) // 1. HasSuffix OR HasSuffix func hasSuffixMethod(s string) bool { return strings.HasSuffix(s, ":aggr") || strings.HasSuffix(s, ":avg") || strings.HasSuffix(s, ":count") || strings.HasSuffix(s, ":sum") || strings.HasSuffix(s, ":min") || strings.HasSuffix(s, ":max") } // 1.5 HasSuffix in a loop func hasSuffixLoop(s string) bool { for _, suffix := range []string{":aggr", ":avg", ":count", ":sum", ":min", ":max"} { if strings.HasSuffix(s, suffix) { return true } } return false } // 2. Split by ":" and check in a set var set = map[string]struct{}{ "aggr": {}, "avg": {}, "count": {}, "sum": {}, "min": {}, "max": {}, } func splitAndCheck(s string) bool { parts := strings.Split(s, ":") if len(parts) != 2 { return false } _, exists := set[parts[1]] return exists } // 3. Regex Match var regex = regexp.MustCompile(`:(aggr|avg|count|sum|min|max)$`) func regexCheck(s string) bool { return regex.MatchString(s) } // 4. Glob Pattern Matching var globPattern = glob.MustCompile("{*:aggr, *:avg, *:count, *:sum, *:min, *:max}") func globCheck(s string) bool { return globPattern.Match(s) } // benchmark func BenchmarkHasSuffixMethod(b *testing.B) { for i := 0; i < b.N; i++ { hasSuffixMethod("example:avg") } } func BenchmarkHasSuffixLoop(b *testing.B) { for i := 0; i < b.N; i++ { hasSuffixLoop("example:avg") } } func BenchmarkSplitAndCheck(b *testing.B) { for i := 0; i < b.N; i++ { splitAndCheck("example:avg") } } func BenchmarkRegexCheck(b *testing.B) { for i := 0; i < b.N; i++ { regexCheck("example:avg") } } func BenchmarkGlobCheck(b *testing.B) { for i := 0; i < b.N; i++ { globCheck("example:avg") } }

jnyi · 2025-02-18T21:40:36Z

pkg/query/aggregation_label_rewriter.go

+		Help:        "Total number of times aggregation-label-rewriter added the aggregation label",
+		ConstLabels: prometheus.Labels{"new_value": desiredLabelValue},
+	})
+	m.aggregationLabelRewriterRuntimeSeconds = promauto.With(reg).NewHistogramVec(prometheus.HistogramOpts{


do we define histogram buckets?

it will be the default prom buckets; linter complains when you explicitly set to prometheus.DefBucket.

actually, It makes sense to use smaller buckets instead

awesome job

pkg/query/aggregation_label_rewriter.go

jnyi

lgtm

yulong-db · 2025-02-19T00:45:17Z

ty! will add dashboard panels in a followup pr

yulong-db force-pushed the yulong-db/aggr_label_rewrite branch from 58aa7b3 to 23023b0 Compare February 14, 2025 00:27

jnyi requested changes Feb 14, 2025

View reviewed changes

jnyi requested review from hczhu-db, yuchen-db and karan-db February 14, 2025 18:28

yulong-db force-pushed the yulong-db/aggr_label_rewrite branch from 4852ba2 to cee296e Compare February 14, 2025 20:00

yulong-db commented Feb 14, 2025

View reviewed changes

pkg/query/aggregation_label_rewriter.go Outdated Show resolved Hide resolved

yulong-db force-pushed the yulong-db/aggr_label_rewrite branch 2 times, most recently from 9f7370a to a396609 Compare February 14, 2025 21:44

yulong-db added 7 commits February 14, 2025 14:44

impl aggr label rewrite capability

d10c948

update to include log

cb92a78

redo

6e67e34

fix tests

06f1420

add unit tests

5270be9

tidy mod

bf81fcf

better cli arg name

7666479

yulong-db force-pushed the yulong-db/aggr_label_rewrite branch from a396609 to 7666479 Compare February 14, 2025 22:44

yulong-db added 3 commits February 14, 2025 14:50

dont' use defaultbucket explicitly

7cf83c4

add copyright to fix linter

5645610

fix

cfd71f6

yulong-db requested a review from jnyi February 18, 2025 20:50

jnyi reviewed Feb 18, 2025

View reviewed changes

pkg/query/aggregation_label_rewriter.go Outdated Show resolved Hide resolved

jnyi reviewed Feb 18, 2025

View reviewed changes

pkg/query/aggregation_label_rewriter.go Outdated Show resolved Hide resolved

jnyi reviewed Feb 18, 2025

View reviewed changes

pkg/query/aggregation_label_rewriter.go Outdated Show resolved Hide resolved

jnyi reviewed Feb 18, 2025

View reviewed changes

pkg/query/aggregation_label_rewriter.go Outdated Show resolved Hide resolved

address comments

261d290

jnyi self-requested a review February 19, 2025 00:26

jnyi approved these changes Feb 19, 2025

View reviewed changes

yulong-db merged commit d83dae6 into db_main Feb 19, 2025
14 checks passed

yulong-db deleted the yulong-db/aggr_label_rewrite branch February 19, 2025 00:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[querier] Impl aggr label rewrite capability #127

[querier] Impl aggr label rewrite capability #127

yulong-db commented Feb 13, 2025 •

edited

Loading

jnyi left a comment •

edited

Loading

jnyi Feb 14, 2025 •

edited

Loading

hczhu-db commented Feb 15, 2025

jnyi Feb 18, 2025 •

edited

Loading

yulong-db Feb 18, 2025

jnyi Feb 18, 2025

yulong-db Feb 18, 2025 •

edited

Loading

yulong-db Feb 18, 2025

jnyi Feb 19, 2025

jnyi left a comment

yulong-db commented Feb 19, 2025

[querier] Impl aggr label rewrite capability #127

[querier] Impl aggr label rewrite capability #127

Conversation

yulong-db commented Feb 13, 2025 • edited Loading

Changes

Verification

jnyi left a comment • edited Loading

Choose a reason for hiding this comment

jnyi Feb 14, 2025 • edited Loading

Choose a reason for hiding this comment

hczhu-db commented Feb 15, 2025

jnyi Feb 18, 2025 • edited Loading

Choose a reason for hiding this comment

yulong-db Feb 18, 2025

Choose a reason for hiding this comment

jnyi Feb 18, 2025

Choose a reason for hiding this comment

yulong-db Feb 18, 2025 • edited Loading

Choose a reason for hiding this comment

yulong-db Feb 18, 2025

Choose a reason for hiding this comment

jnyi Feb 19, 2025

Choose a reason for hiding this comment

jnyi left a comment

Choose a reason for hiding this comment

yulong-db commented Feb 19, 2025

yulong-db commented Feb 13, 2025 •

edited

Loading

jnyi left a comment •

edited

Loading

jnyi Feb 14, 2025 •

edited

Loading

jnyi Feb 18, 2025 •

edited

Loading

yulong-db Feb 18, 2025 •

edited

Loading