-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
compactor: add downsample duration histogram #4551
Closed
vanugrah
wants to merge
2
commits into
thanos-io:main
from
vanugrah:metrics/compactor_downsample_duration
Closed
Changes from 1 commit
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -38,6 +38,7 @@ import ( | |
type DownsampleMetrics struct { | ||
downsamples *prometheus.CounterVec | ||
downsampleFailures *prometheus.CounterVec | ||
downsampleDuration *prometheus.Histogram | ||
} | ||
|
||
func newDownsampleMetrics(reg *prometheus.Registry) *DownsampleMetrics { | ||
|
@@ -51,6 +52,11 @@ func newDownsampleMetrics(reg *prometheus.Registry) *DownsampleMetrics { | |
Name: "thanos_compact_downsample_failures_total", | ||
Help: "Total number of failed downsampling attempts.", | ||
}, []string{"group"}) | ||
m.downsampleDuration = promauto.With(reg).NewHistogram(prometheus.HistogramOpts{ | ||
Name: "thanos_compact_downsample_duration_seconds", | ||
Help: "Duration of downsample runs", | ||
Buckets: []float64{1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192}, | ||
}) | ||
|
||
return m | ||
} | ||
|
@@ -237,7 +243,7 @@ func downsampleBucket( | |
resolution = downsample.ResLevel2 | ||
errMsg = "downsampling to 60 min" | ||
} | ||
if err := processDownsampling(ctx, logger, bkt, m, dir, resolution, hashFunc); err != nil { | ||
if err := processDownsampling(ctx, logger, bkt, m, dir, resolution, hashFunc, metrics); err != nil { | ||
metrics.downsampleFailures.WithLabelValues(compact.DefaultGroupKey(m.Thanos)).Inc() | ||
return errors.Wrap(err, errMsg) | ||
} | ||
|
@@ -309,7 +315,16 @@ func downsampleBucket( | |
return nil | ||
} | ||
|
||
func processDownsampling(ctx context.Context, logger log.Logger, bkt objstore.Bucket, m *metadata.Meta, dir string, resolution int64, hashFunc metadata.HashFunc) error { | ||
func processDownsampling( | ||
ctx context.Context, | ||
logger log.Logger, | ||
bkt objstore.Bucket, | ||
m *metadata.Meta, | ||
dir string, | ||
resolution int64, | ||
hashFunc metadata.HashFunc, | ||
metrics *DownsampleMetrics, | ||
) error { | ||
begin := time.Now() | ||
bdir := filepath.Join(dir, m.ULID.String()) | ||
|
||
|
@@ -346,6 +361,7 @@ func processDownsampling(ctx context.Context, logger log.Logger, bkt objstore.Bu | |
|
||
level.Info(logger).Log("msg", "downsampled block", | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Might be good to initialize a variable for passed_time above this line so that we don't have to call There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agreed |
||
"from", m.ULID, "to", id, "duration", time.Since(begin), "duration_ms", time.Since(begin).Milliseconds()) | ||
metrics.downsampleDuration.Observe(time.Since(begin).Seconds()) | ||
|
||
if err := block.VerifyIndex(logger, filepath.Join(resdir, block.IndexFilename), m.MinTime, m.MaxTime); err != nil { | ||
return errors.Wrap(err, "output block index not valid") | ||
|
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can start from a larger number. Please note that the time includes the block download and index verification steps so small buckets doesn't make sense here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello - that turn around was blazing fast!
I have no reservations in making the starting intervals larger - I was merely trying to replicate the same intervals used in the
prometheus_tsdb_compaction_duration_seconds_bucket
metric currently being exposed by the compactor.If I'm not mistaken the
begin
variable is updated totime.Now()
once the block has been downloaded and the index has been verified: https://github.com/thanos-io/thanos/blob/main/cmd/thanos/downsample.go#L326So what this should measure is the time taken to open the block and run the downsample.Downsample method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are right! Sorry, the Github code review just showed the
begin := time.Now()
at the beginning so I misunderstood it.But it would be better to double-check the buckets. Also might be good to have a label for different resolution. 5m block and 1h block are different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love it - I was thinking the same. I was originally conflicted because the compaction duration histogram has no labels (like compaction level) so I thought might be better to conform to the standard. But the resolution level would result in something more meaningful.
I'm also thinking of adding histograms for overall block upload and download times as well (in another PR). Having resolution/compaction level on that may be beneficial as well.