Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[querier] Impl aggr label rewrite capability #127

Merged
merged 11 commits into from
Feb 19, 2025

Conversation

yulong-db
Copy link
Collaborator

@yulong-db yulong-db commented Feb 13, 2025

deployed to dev-aws-us-east-1-obs-1 with --query.aggregation-label-value-override=true and --query.aggregation-label-value-override=v1

no resource impact observed ( spikes are the two timestamps when I restarted deployment)

image image

Queries still work when --query.aggregation-label-value-override=true, which is expected
and queries will not return anything when --query.aggregation-label-value-override=v1, as expected, because the corresponding time series with label __rollup__=v1 is not created yet.

emitted metrics are visible from monitoring prom: https://m3-aws-us-east-1-obs1-monitoring-prom.dev.databricks.com/graph?g0.expr=%7B__name__%3D~%22thanos_query_aggregation_label_rewriter.%2B%22%7D&g0.tab=1&g0.display_mode=lines&g0.show_exemplars=0&g0.range_input=1h

p999 extra runtime this operations adds to a query is lower than 5ms, since that's the smallest bucket. Maybe 1-2ms ish. so minimal performance impact.

image
  • I added CHANGELOG entry for this change.
  • Change is not relevant to the end user.

Changes

Verification

@yulong-db yulong-db force-pushed the yulong-db/aggr_label_rewrite branch from 58aa7b3 to 23023b0 Compare February 14, 2025 00:27
Copy link
Collaborator

@jnyi jnyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add unit tests as well as some metrics

@@ -245,6 +245,8 @@ func registerQuery(app *extkingpin.App) {
enforceTenancy := cmd.Flag("query.enforce-tenancy", "Enforce tenancy on Query APIs. Responses are returned only if the label value of the configured tenant-label-name and the value of the tenant header matches.").Default("false").Bool()
tenantLabel := cmd.Flag("query.tenant-label-name", "Label name to use when enforcing tenancy (if --query.enforce-tenancy is enabled).").Default(tenancy.DefaultTenantLabel).String()

rewriteAggregationLabelTo := cmd.Flag("query.rewrite-aggregation-label-to", "Rewrite the aggregation label to the provided value for metric with name ending in standard aggregation suffixes.").Default("").String()
Copy link
Collaborator

@jnyi jnyi Feb 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the cmd flag name is a bit confusing as well as the helper message, might try something like query.aggregation-rewrite-filter, also it is good to call out the default behavior in helper message. I am not very good at naming too, i'd suggest you can also seek review from @hczhu-db @yuchen-db who don't have sufficient content to be able to understand what this flag means by purely reading it.

pkg/query/querier.go Outdated Show resolved Hide resolved
pkg/query/querier.go Outdated Show resolved Hide resolved
pkg/query/querier.go Outdated Show resolved Hide resolved
pkg/query/querier.go Outdated Show resolved Hide resolved
pkg/query/querier.go Outdated Show resolved Hide resolved
pkg/query/querier.go Outdated Show resolved Hide resolved
pkg/query/querier.go Outdated Show resolved Hide resolved
pkg/query/querier.go Outdated Show resolved Hide resolved
@yulong-db yulong-db force-pushed the yulong-db/aggr_label_rewrite branch from 4852ba2 to cee296e Compare February 14, 2025 20:00
@yulong-db yulong-db force-pushed the yulong-db/aggr_label_rewrite branch 2 times, most recently from 9f7370a to a396609 Compare February 14, 2025 21:44
@yulong-db yulong-db force-pushed the yulong-db/aggr_label_rewrite branch from a396609 to 7666479 Compare February 14, 2025 22:44
@hczhu-db
Copy link
Collaborator

@yulong-db Can you test its performance impact on a large dev shard? I suppose it should be minimum but it's better to confirm it with data points.
image

@yulong-db yulong-db requested a review from jnyi February 18, 2025 20:50
go.mod Outdated
@@ -114,6 +114,7 @@ require (
require (
capnproto.org/go/capnp/v3 v3.0.1-alpha.2.0.20240830165715-46ccd63a72af
github.com/cortexproject/promqlsmith v0.0.0-20240506042652-6cfdd9739a5e
github.com/gobwas/glob v0.2.3
Copy link
Collaborator

@jnyi jnyi Feb 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few comments:

  1. glob is not efficient for just suffix matching, I'd prefer to simply split metric_name by ":" and match exact strings on the last part (suffix)
  2. if using glob, you can use golang naive one: filepath.Match() instead of importing additional mod

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done some benchtest,

using plain hasSuffix joined together by OR seems to be fastest, followed by glob, followed by split.

Will go with the hasSuffix route.

yu.long@HH9DQ02X5W suffix_bench %    go test -bench=.
goos: darwin
goarch: arm64
pkg: suffix
cpu: Apple M3 Max
BenchmarkHasSuffixMethod-16     188687617                6.281 ns/op
BenchmarkHasSuffixLoop-16       170946948                7.045 ns/op
BenchmarkSplitAndCheck-16       35680746                33.71 ns/op
BenchmarkRegexCheck-16          17390138                69.83 ns/op
BenchmarkGlobCheck-16           59952289                20.18 ns/op
PASS
ok      suffix  7.838s
yu.long@HH9DQ02X5W suffix_bench %    go test -bench=.
goos: darwin
goarch: arm64
pkg: suffix
cpu: Apple M3 Max
BenchmarkHasSuffixMethod/example:aggr-16                303715452                3.809 ns/op
BenchmarkHasSuffixMethod/example:avg-16                 195510680                6.093 ns/op
BenchmarkHasSuffixMethod/example:count-16               152105118                7.909 ns/op
BenchmarkHasSuffixMethod/example:sum-16                 123796609                9.665 ns/op
BenchmarkHasSuffixMethod/example:min-16                 100000000               11.60 ns/op
BenchmarkHasSuffixMethod/example:max-16                 86561088                13.18 ns/op
BenchmarkHasSuffixMethod/example:unknown-16             93237375                12.66 ns/op
BenchmarkHasSuffixMethod/example-16                     92503970                12.59 ns/op
BenchmarkHasSuffixMethod/:aggr-16                       398362672                3.020 ns/op
BenchmarkHasSuffixMethod/aggr:-16                       100000000               11.09 ns/op
BenchmarkHasSuffixMethod/example:aggr:extra-16          93556914                12.68 ns/op
BenchmarkHasSuffixLoop/example:aggr-16                  271168821                4.413 ns/op
BenchmarkHasSuffixLoop/example:avg-16                   162088345                7.367 ns/op
BenchmarkHasSuffixLoop/example:count-16                 100000000               10.71 ns/op
BenchmarkHasSuffixLoop/example:sum-16                   91479680                13.33 ns/op
BenchmarkHasSuffixLoop/example:min-16                   70518752                16.37 ns/op
BenchmarkHasSuffixLoop/example:max-16                   66025968                18.37 ns/op
BenchmarkHasSuffixLoop/example:unknown-16               60520857                19.95 ns/op
BenchmarkHasSuffixLoop/example-16                       59855971                19.85 ns/op
BenchmarkHasSuffixLoop/:aggr-16                         281107880                4.283 ns/op
BenchmarkHasSuffixLoop/aggr:-16                         67457257                17.64 ns/op
BenchmarkHasSuffixLoop/example:aggr:extra-16            59150574                20.04 ns/op
BenchmarkSplitAndCheck/example:aggr-16                  38315959                30.22 ns/op
BenchmarkSplitAndCheck/example:avg-16                   39383161                30.45 ns/op
BenchmarkSplitAndCheck/example:count-16                 36222586                31.61 ns/op
BenchmarkSplitAndCheck/example:sum-16                   34211833                35.03 ns/op
BenchmarkSplitAndCheck/example:min-16                   30136020                37.26 ns/op
BenchmarkSplitAndCheck/example:max-16                   27334514                41.07 ns/op
BenchmarkSplitAndCheck/example:unknown-16               36580300                32.55 ns/op
BenchmarkSplitAndCheck/example-16                       60206082                19.80 ns/op
BenchmarkSplitAndCheck/:aggr-16                         42874752                27.78 ns/op
BenchmarkSplitAndCheck/aggr:-16                         39029203                30.27 ns/op
BenchmarkSplitAndCheck/example:aggr:extra-16            33315325                34.88 ns/op
BenchmarkRegexCheck/example:aggr-16                     20046482                60.38 ns/op
BenchmarkRegexCheck/example:avg-16                      17586540                63.64 ns/op
BenchmarkRegexCheck/example:count-16                    18019990                65.57 ns/op
BenchmarkRegexCheck/example:sum-16                      17630983                67.52 ns/op
BenchmarkRegexCheck/example:min-16                      15944292                75.79 ns/op
BenchmarkRegexCheck/example:max-16                      14956338                80.25 ns/op
BenchmarkRegexCheck/example:unknown-16                  20117041                59.68 ns/op
BenchmarkRegexCheck/example-16                          51713364                21.79 ns/op
BenchmarkRegexCheck/:aggr-16                            19919767                60.52 ns/op
BenchmarkRegexCheck/aggr:-16                            19887936                60.44 ns/op
BenchmarkRegexCheck/example:aggr:extra-16                9326644               129.3 ns/op
BenchmarkGlobCheck/example:aggr-16                      298770154                4.030 ns/op
BenchmarkGlobCheck/example:avg-16                       61437639                19.55 ns/op
BenchmarkGlobCheck/example:count-16                     59729350                19.45 ns/op
BenchmarkGlobCheck/example:sum-16                       61394812                19.45 ns/op
BenchmarkGlobCheck/example:min-16                       61882252                19.43 ns/op
BenchmarkGlobCheck/example:max-16                       60936904                19.48 ns/op
BenchmarkGlobCheck/example:unknown-16                   60923112                19.39 ns/op
BenchmarkGlobCheck/example-16                           62624896                19.48 ns/op
BenchmarkGlobCheck/:aggr-16                             294966518                4.037 ns/op
BenchmarkGlobCheck/aggr:-16                             61968932                19.60 ns/op
BenchmarkGlobCheck/example:aggr:extra-16                61155585                19.52 ns/op

source code:

package main

import (
	"regexp"
	"strings"
	"testing"

	"github.com/gobwas/glob"
)

// 1. HasSuffix OR HasSuffix
func hasSuffixMethod(s string) bool {
	return strings.HasSuffix(s, ":aggr") ||
		strings.HasSuffix(s, ":avg") ||
		strings.HasSuffix(s, ":count") ||
		strings.HasSuffix(s, ":sum") ||
		strings.HasSuffix(s, ":min") ||
		strings.HasSuffix(s, ":max")
}

// 1.5 HasSuffix in a loop
func hasSuffixLoop(s string) bool {
	for _, suffix := range []string{":aggr", ":avg", ":count", ":sum", ":min", ":max"} {
		if strings.HasSuffix(s, suffix) {
			return true
		}
	}
	return false
}

// 2. Split by ":" and check in a set
var set = map[string]struct{}{
	"aggr":  {},
	"avg":   {},
	"count": {},
	"sum":   {},
	"min":   {},
	"max":   {},
}

func splitAndCheck(s string) bool {
	parts := strings.Split(s, ":")
	if len(parts) != 2 {
		return false
	}
	_, exists := set[parts[1]]
	return exists
}

// 3. Regex Match
var regex = regexp.MustCompile(`:(aggr|avg|count|sum|min|max)$`)

func regexCheck(s string) bool {
	return regex.MatchString(s)
}

// 4. Glob Pattern Matching
var globPattern = glob.MustCompile("{*:aggr, *:avg, *:count, *:sum, *:min, *:max}")

func globCheck(s string) bool {
	return globPattern.Match(s)
}

// benchmark

func BenchmarkHasSuffixMethod(b *testing.B) {
	for i := 0; i < b.N; i++ {
		hasSuffixMethod("example:avg")
	}
}

func BenchmarkHasSuffixLoop(b *testing.B) {
	for i := 0; i < b.N; i++ {
		hasSuffixLoop("example:avg")
	}
}

func BenchmarkSplitAndCheck(b *testing.B) {
	for i := 0; i < b.N; i++ {
		splitAndCheck("example:avg")
	}
}

func BenchmarkRegexCheck(b *testing.B) {
	for i := 0; i < b.N; i++ {
		regexCheck("example:avg")
	}
}

func BenchmarkGlobCheck(b *testing.B) {
	for i := 0; i < b.N; i++ {
		globCheck("example:avg")
	}
}

Help: "Total number of times aggregation-label-rewriter added the aggregation label",
ConstLabels: prometheus.Labels{"new_value": desiredLabelValue},
})
m.aggregationLabelRewriterRuntimeSeconds = promauto.With(reg).NewHistogramVec(prometheus.HistogramOpts{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we define histogram buckets?

Copy link
Collaborator Author

@yulong-db yulong-db Feb 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it will be the default prom buckets; linter complains when you explicitly set to prometheus.DefBucket.

image

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, It makes sense to use smaller buckets instead

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome job

@jnyi jnyi self-requested a review February 19, 2025 00:26
Copy link
Collaborator

@jnyi jnyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@yulong-db
Copy link
Collaborator Author

ty! will add dashboard panels in a followup pr

@yulong-db yulong-db merged commit d83dae6 into db_main Feb 19, 2025
14 checks passed
@yulong-db yulong-db deleted the yulong-db/aggr_label_rewrite branch February 19, 2025 00:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants