Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding a scrape-time recording rules component to Alloy #414

Open
mikemykhaylov opened this issue Jul 6, 2023 · 16 comments
Open

Adding a scrape-time recording rules component to Alloy #414

mikemykhaylov opened this issue Jul 6, 2023 · 16 comments
Labels
proposal A proposal for new functionality.

Comments

@mikemykhaylov
Copy link

mikemykhaylov commented Jul 6, 2023

Proposal: Scrape-time recording rules component

Abstract

We add a new component to Grafana Alloy along the lines of prometheus.recordrules that allows us to specify recording rules to run at scrape time in a similar way to what Prometheus does at store time.

Problem

In our K8s environments, one of the types of metrics we collect with Grafana Alloy are the network flows between pods and outside IPs. Due to the clusters' size, as well as having explicit "source" and "destination" labels on the metrics, the cardinality is exploding, easily reaching above 10k metrics reported per node. We would like to conditionally reduce the cardinality of a couple of metrics, preserving specific labels and dropping others, prior to doing a remote write to Mimir.

Using prometheus.relabel to drop labels is not an option, because in our use case, it would generate duplicate metrics (i.e. there are metrics that are the same by all labels except those that we would like to drop), and so Prometheus would drop them (link to post explaining this). What we need instead is, paraphrasing the article

my_metric_total{env=“dev”, ip=“1.1.1.1"} 1
my_metric_total{env=“dev”, ip=“3.3.3.3"} 3
my_metric_total{env=“dev”, ip=“5.5.5.5"} 7

# Dropped label ip
my_metric_total{env=“dev”} 11

Assuming the multitude of labels that would be tedious to enumerate, an obvious solution to this would be the use of recording rules. Namely, rule like the following would work (followed by the relabel dropping the original metrics)

rules:
  - record: my_metric_total_noip
    expr: sum without (ip) (my_metric_total)

However, if we set up this rule in Mimir, this doesn't really help, as the metrics are already stored. If, instead, the rules ran on the Alloy, only the Alloy would have to deal with the metrics, while Mimir would only see the reduced-cardinality metrics

Proposal

A new component is introduced called prometheus.recordrules. It allows the user to configure a set of recording rules that will run at scrape time. The rules' syntax will be a subset of PromQL, disallowing queries over a range of time. All queries are evaluated against only the metrics collected in the current scrape iteration, and previously collected metrics are not stored.

Pros

  • Solves the issue of storing high-cardinality metrics
  • Allows for pre-processing of metrics before doing a remote write

Cons

  • Given an arbitrary complexity of a recording rule and a large number of metrics to be processed, excessive rules usage might hurt performance

Compatibility

Adding a new component is backwards-compatible

Implementation

We could have a thin storage.Appendable and storage.Queryable implementation which is exported as a receiver and appended to; we would then convert our rule configuration into Prometheus-native RuleGroup, evaluate it, and register new metrics in the next component's receiver. A related PR implementing scrape-time recording rules has been opened in the Prometheus repo prometheus/prometheus#10529, if that is merged then we could leverage this functionality in Alloy

@mikemykhaylov mikemykhaylov added the proposal A proposal for new functionality. label Jul 6, 2023
@rfratto
Copy link
Member

rfratto commented Jul 6, 2023

grafana/agent#393 prototyped the basis of something very similar to this, where you could apply recording rules against individual scrapes, but we never found the time to finish the prototype and add it officially.

What kind of aggregation are we talking about here: within the scope of a single scrape, or across all samples sent to the proposed component?

@mikemykhaylov
Copy link
Author

I believe that our use case would be satisfied with aggregations across a single scrape. The output should be independent of what went through the component previously, as our goal is to basically construct a reduced set of metrics from the currently reported samples. Besides, I don't think that persistent storage for aggregating across all samples that passed through the component is something we would want, as it would increase the complexity of our supposedly thin storage.

@rfratto
Copy link
Member

rfratto commented Jul 6, 2023

I don't speak for all the project maintainers, but I'd personally be in favor of reviving grafana/agent#393 and having a new Flow component which performs recording rules / aggregations against a single scrape 👍

It wouldn't permit range queries (i.e., no rates or _over_times), but I think it's a great starting point for doing client-side aggregations and cutting costs.

@GrgDev
Copy link

GrgDev commented Apr 8, 2024

Did this get any traction past the proposal state? My use case could really use this.

Some of our tenants have very large metric sets, high in both cardinality and sample counts. They use recording rules in my team's Mimir instance to reduce that cardinality and sample size, but since the raw data set is so large, the recording rules will sometimes fail to finish processing before the next evaluation resulting in gaps in the metric series that the recording rules produce. They already run in their own rule groups, so the Mimir rulers can't spread them out any further. It would be really nice if we could offload some of that aggregation upstream inside the Grafana Agent sender.

@rfratto rfratto transferred this issue from grafana/agent Apr 11, 2024
@rfratto rfratto moved this to Incoming in Alloy proposals Jun 14, 2024
@wildum
Copy link
Contributor

wildum commented Jul 15, 2024

Hey, thanks for opening this proposal, I think that it makes a lot of sense :) We have a new proposal review process. I'd be happy to move your proposal to the active stage if you could add a little bit more details (could you rework it to use our template?) What do you think @mikemykhaylov?

@wildum
Copy link
Contributor

wildum commented Jul 17, 2024

A very similar idea is being implemented in prometheus. If it's accepted we could integrate it in the scrape component of Alloy

@mikemykhaylov
Copy link
Author

Sure, I'd be glad to rework it to the template! The Prometheus PR does look like something that could be used here; if that work wraps up I'll definitely take a crack at adding it to Alloy, in part as an excuse to get a refresher on Go 😅

@mikemykhaylov mikemykhaylov changed the title Add a component that allows to run recording rules on Grafana Agent Adding a scrape-time recording rules component to Alloy Jul 17, 2024
@wildum
Copy link
Contributor

wildum commented Jul 19, 2024

I'm moving this proposal to "Active" state to discuss it more in-depth with the team and reach a consensus.

@wildum wildum moved this from Incoming to Active in Alloy proposals Jul 19, 2024
@mattdurham
Copy link
Collaborator

If prometheus adds it and we can use it, then that seems like a big win. Wait and see then.

@bingkunyangvungle
Copy link

bingkunyangvungle commented Aug 31, 2024

Just to follow up, do we have some updates on this? Adding a component that support it would be super helpful to reduce the metrics cardinality.

@wildum
Copy link
Contributor

wildum commented Sep 11, 2024

Hey,

One concern about it was expressed in the Prometheus PR: How should counter resets be handled?
For example, if I aggregate two counters metrics A and B, if B resets but A doesn't, then my aggregated metric will be wrong.
If I understand correctly, this is only an edge case because counters should usually reset at the same time.
One solution to handle proper counter reset would be to do something similar to what VictorMetrics is doing: https://docs.victoriametrics.com/stream-aggregation/#total_prometheus.
For the first version, I would suggest documenting this issue and seeing whether a proper handling like this is worth it.

Unless there are blocking concerns to the proposal, I will move it to "Likely accepted" by the end of the week. Then we will wait one more week to wait in case a major concern arises. If not then we will move the proposal to "Accepted" and anyone will be free to implement it (please mention it in this issue before starting the work to avoid having several people doing it in parallel)

@wildum
Copy link
Contributor

wildum commented Sep 20, 2024

The proposal is accepted :) As stated above, anyone is free to pick it up, just mention it before starting the implementation to avoid duplication

@thampiotr
Copy link
Contributor

How should this work with clustering? For example:

my_metric_total{env=“dev”, ip=“1.1.1.1"} 1      # ends up on instance A
my_metric_total{env=“dev”, ip=“3.3.3.3"} 3.  # ends up on instance B
my_metric_total{env=“dev”, ip=“5.5.5.5"} 7.   # ends up on instance A

The metrics required to aggregate would be on different Alloy instances, since we shard the targets. How would we want to handle this?
Or is this component meant to run only on single-instance deployments where scraping is not distributed?

@sjentzsch
Copy link

Same as Prometheus handles it when sharding is activated.
From my perspective, all recording rules should be evaluated on all shards. You would not have any overlap, as the metric base does not overlap. Only the scrape jobs are sharded, not the recording rules.

@mattdurham
Copy link
Collaborator

Thinking on this more this would not be a direct port/conversion of prometheus but something a bit different. Conceptually more akin to adaptive metrics. I think @sjentzsch is that it only applies to a single target scrape, the actual proposal might be closer to something like adapative metrics.

@thampiotr
Copy link
Contributor

thampiotr commented Oct 4, 2024

Same as Prometheus handles it when sharding is activated.

@sjentzsch What are you referring to when you say Prometheus sharding? I'm not sure which Prometheus feature you're referring to.

From my perspective, all recording rules should be evaluated on all shards. You would not have any overlap, as the metric base does not overlap. Only the scrape jobs are sharded, not the recording rules.

If we did the above on my example, let's say, to calculate the avg without(ip) (...):

my_metric_total{env=“dev”, ip=“1.1.1.1"} 1      # ends up on instance A
my_metric_total{env=“dev”, ip=“3.3.3.3"} 3.  # ends up on instance B
my_metric_total{env=“dev”, ip=“5.5.5.5"} 7.   # ends up on instance A

We would get something like this:

my_metric_total{env=“dev”} 4      # calculated on instance A
my_metric_total{env=“dev”} 3     # calculated on instance B

without any further handling, one of these writes would fail with duplicate sample / out-of-order.
And I'm not sure how do we get to the correct average, which is 3.66 for this example.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
proposal A proposal for new functionality.
Projects
Status: Accepted
Development

No branches or pull requests

8 participants