Add storage 'middleware' to do scrape time aggregation. #393

tomwilkie · 2021-02-07T17:39:28Z

Signed-off-by: Tom Wilkie [email protected]

PR Description

This PR adds a storage "middleware" that wraps the existing storage and allows you to run quasi-recording rules again commit batches.
The idea here is that you can use these rules to aggregate away certain labels that lead to high cardinality.

Which issue(s) this PR fixes

Notes to the Reviewer

NB this isn't wired in anywhere, and is very much just a proof of concept of an idea - that you can reuse the PromQL engine to do these scrape-time aggregations, and don't need to invent another way of doing it.

Right now this won't work as "AddFast" isn't implemented. We need the labels to be able to execute the PromQL, so we need some cache for these.

Also, the batch doesn't use any form of index, making select queries more expensive than they should be.

PR Checklist

CHANGELOG updated
Documentation added
Tests updated

Signed-off-by: Tom Wilkie <[email protected]>

CLAassistant · 2021-02-07T17:39:32Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

rfratto · 2021-02-07T18:34:00Z

pkg/aggregator/aggregator.go

+func (b *batch) Add(l labels.Labels, t int64, v float64) (uint64, error) {
+	b.samples = append(b.samples, sample{l, t, v})
+	return b.appender.Add(l, t, v)
+}
+
+func (b *batch) AddFast(ref uint64, t int64, v float64) error {
+	return b.appender.AddFast(ref, t, v)
+}
+
+func (b *batch) Commit() error {
+	for _, r := range b.aggregator.rules {
+		if err := b.execute(r); err != nil {
+			return err
+		}
+	}
+
+	return b.appender.Commit()
+}


Very cool. If I understand correctly, this is going to commit both aggregated and non-aggregated samples to the WAL, right? I think this will end up with a usage pattern of the aggregation rules always being combined with metric_relabeling_rules to drop the original non-aggregated samples.

It would be cool if we could defer writes to the underlying appender and only write aggregated samples and samples that weren't part of an aggregation rule, but I fear that might be more complicated than it's worth.

tomwilkie · 2021-02-07T18:48:02Z

I was originally thinking that we’d use relabelling rules to drop them yeah. Could “taint” samples that were touched by series, doesn’t sound too hard. I’d make it an option though, just in case users want both...

…

On Sun, 7 Feb 2021 at 18:34, Robert Fratto ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In pkg/aggregator/aggregator.go <#393 (comment)>: > +func (b *batch) Add(l labels.Labels, t int64, v float64) (uint64, error) { + b.samples = append(b.samples, sample{l, t, v}) + return b.appender.Add(l, t, v) +} + +func (b *batch) AddFast(ref uint64, t int64, v float64) error { + return b.appender.AddFast(ref, t, v) +} + +func (b *batch) Commit() error { + for _, r := range b.aggregator.rules { + if err := b.execute(r); err != nil { + return err + } + } + + return b.appender.Commit() +} Very cool. If I understand correctly, this is going to commit both aggregated and non-aggregated samples to the WAL, right? I think this will end up with a usage pattern of the aggregation rules always being combined with metric_relabeling_rules to drop the original non-aggregated samples. It would be cool if we could defer writes to the underlying appender and only write aggregated samples and samples that weren't part of an aggregation rule, but I fear that might be more complicated than it's worth. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#393 (review)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AADMNBLPDIG7PBQZZSPITTLS53MKNANCNFSM4XHVQHFA> .

RichiH · 2021-02-25T10:31:47Z

I keep revisiting this thought and the underlying approach again and again, so just to persist it:

I would strongly argue in favor of allowing both "keep" and "taint and remove".

rfratto · 2021-08-18T14:24:38Z

As part of our bug scrub, we've decided to close this for now. We're planning on adding support for this eventually, where we'll use this PR as a base. I've opened grafana/alloy#554 to track this going forward.

Add storage 'middleware' to do scrape time aggregation.

ef163ad

Signed-off-by: Tom Wilkie <[email protected]>

tomwilkie mentioned this pull request Feb 7, 2021

Scrape-time rule evaluation prometheus/prometheus#394

Open

rfratto reviewed Feb 7, 2021

View reviewed changes

rfratto added the keepalive Never close from staleness label Feb 7, 2021

Base automatically changed from master to main February 17, 2021 15:27

RichiH mentioned this pull request Feb 24, 2021

Prototype an experimental version of OTel Collector Prometheus receiver using Grafana Cloud Agent open-telemetry/prometheus-interoperability-spec#22

Closed

rfratto mentioned this pull request Aug 18, 2021

Scrape-time aggregation grafana/alloy#554

Open

rfratto closed this Aug 18, 2021

rfratto mentioned this pull request Jul 6, 2023

Adding a scrape-time recording rules component to Alloy grafana/alloy#414

Open

github-actions bot added the frozen-due-to-age Locked due to a period of inactivity. Please open new issues or PRs if more discussion is needed. label Apr 11, 2024

github-actions bot locked as resolved and limited conversation to collaborators Apr 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add storage 'middleware' to do scrape time aggregation. #393

Add storage 'middleware' to do scrape time aggregation. #393

tomwilkie commented Feb 7, 2021

CLAassistant commented Feb 7, 2021 •

edited

Loading

rfratto Feb 7, 2021 •

edited

Loading

tomwilkie commented Feb 7, 2021 via email

RichiH commented Feb 25, 2021

rfratto commented Aug 18, 2021

Add storage 'middleware' to do scrape time aggregation. #393

Add storage 'middleware' to do scrape time aggregation. #393

Conversation

tomwilkie commented Feb 7, 2021

PR Description

Which issue(s) this PR fixes

Notes to the Reviewer

PR Checklist

CLAassistant commented Feb 7, 2021 • edited Loading

rfratto Feb 7, 2021 • edited Loading

Choose a reason for hiding this comment

tomwilkie commented Feb 7, 2021 via email

RichiH commented Feb 25, 2021

rfratto commented Aug 18, 2021

CLAassistant commented Feb 7, 2021 •

edited

Loading

rfratto Feb 7, 2021 •

edited

Loading