Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add metricscardinality to heartbeat #1235

Merged
merged 1 commit into from
Jan 23, 2025

Conversation

alexcastilio
Copy link
Contributor

@alexcastilio alexcastilio commented Jan 16, 2025

Description

Add metricscardinality to heartbeat. This will give visibility on the number of time series being exposed by retina.

Related Issue

#1040

Checklist

  • I have read the contributing documentation.
  • I signed and signed-off the commits (git commit -S -s ...). See this documentation on signing commits.
  • I have correctly attributed the author(s) of the code.
  • I have tested the changes locally.
  • I have followed the project's style guidelines.
  • I have updated the documentation, if necessary.
  • I have added tests, if applicable.

Screenshots (if applicable) or Testing Completed

Metrics exported with heartbeat:
image

Additional Notes

Metrics of types histogram and summary expose multiple time series during a scrape. Code is counting according to number of time series exposed at /metrics endpoint.

Ref:
https://prometheus.io/docs/concepts/metric_types/#histogram
https://prometheus.io/docs/concepts/metric_types/#summary


Please refer to the CONTRIBUTING.md file for more information on how to contribute to this project.

@alexcastilio alexcastilio force-pushed the alexcastilio/metrics-cardinality-prometheus branch 2 times, most recently from 4efa73e to b0beff5 Compare January 17, 2025 13:55
@alexcastilio alexcastilio marked this pull request as ready for review January 17, 2025 14:04
@alexcastilio alexcastilio requested a review from a team as a code owner January 17, 2025 14:04
@alexcastilio alexcastilio force-pushed the alexcastilio/metrics-cardinality-prometheus branch from b0beff5 to 7af0632 Compare January 17, 2025 18:48
@matmerr
Copy link
Member

matmerr commented Jan 17, 2025

I think a simpler solution may just be to enumerate over the metrics and take the length of the metrics slice

https://gist.github.com/matmerr/f2e31d9be0b5af0eec750fa944edc617#file-counter_test-go-L58


// count all metrics and combination of labels currently exposed on
// prometheus endpoint
metrics, err := registry.Gather()
if err != nil {
	fmt.Println("Error gathering metrics: ", err)
}

// count all metrics exposed for each metric
var metriccounter int
for _, metric := range metrics {
	metriccounter += len(metric.Metric)
}

where the metrics endpoint looks like this:

image

and in stdout we see the same, total metrics is 4
image

@alexcastilio
Copy link
Contributor Author

@matmerr if I only loop through the metrics and add the length of each, this will cause a difference between the count seen through scraping and by this code. Metrics of type histogram and summary expose more time series for quantile/bucket, count and sum. Is it ok to exist this difference in the counting between what's seen in the /metrics endpoint and by summing the length of metrics in code?

@anubhabMajumdar
Copy link
Contributor

@alexcastilio How much is the difference? Is it 2x or a constant number?

@alexcastilio
Copy link
Contributor Author

alexcastilio commented Jan 21, 2025

@anubhabMajumdar the difference is:

  • for each histogram: number of buckets + 3 (le="+Inf", _sum, _count)
  • for each summary: number of quantiles + 2 (_sum, _count)

@matmerr
Copy link
Member

matmerr commented Jan 21, 2025

The issue like @alexcastilio showed me is that histogram's create more metrics that can't be summed by just length, and our apiserver metrics latency uses histogram type.

func CreatePrometheusHistogramWithLinearBucketsForMetric(r prometheus.Registerer, name, desc string, start, width float64, count int) prometheus.Histogram {

Confirmed with him offline that this diff outputs the correct results. Sample endpoint, including metric of histogram type:
image

Results of his diff:
image

matmerr
matmerr previously approved these changes Jan 22, 2025
@alexcastilio alexcastilio force-pushed the alexcastilio/metrics-cardinality-prometheus branch 2 times, most recently from 9ce3044 to 5372dac Compare January 23, 2025 14:40
Signed-off-by: Alex Castilio dos Santos <[email protected]>
@alexcastilio alexcastilio force-pushed the alexcastilio/metrics-cardinality-prometheus branch from 5372dac to 68b258d Compare January 23, 2025 14:41
@alexcastilio alexcastilio requested a review from matmerr January 23, 2025 14:51
@alexcastilio alexcastilio added this pull request to the merge queue Jan 23, 2025
Merged via the queue into main with commit c5ea179 Jan 23, 2025
27 checks passed
@alexcastilio alexcastilio deleted the alexcastilio/metrics-cardinality-prometheus branch January 23, 2025 18:14
kamilprz pushed a commit to kamilprz/retina that referenced this pull request Jan 24, 2025
# Description

Add `metricscardinality` to heartbeat. This will give visibility on the
number of time series being exposed by retina.

## Related Issue

microsoft#1040 

## Checklist

- [x] I have read the [contributing
documentation](https://retina.sh/docs/contributing).
- [x] I signed and signed-off the commits (`git commit -S -s ...`). See
[this
documentation](https://docs.github.com/en/authentication/managing-commit-signature-verification/about-commit-signature-verification)
on signing commits.
- [x] I have correctly attributed the author(s) of the code.
- [x] I have tested the changes locally.
- [x] I have followed the project's style guidelines.
- [x] I have updated the documentation, if necessary.
- [x] I have added tests, if applicable.

## Screenshots (if applicable) or Testing Completed

Metrics exported with heartbeat:

![image](https://github.com/user-attachments/assets/6fb4d76b-5780-4751-91c0-46e3c4f0fb85)


## Additional Notes

Metrics of types `histogram` and `summary` expose multiple time series
during a scrape. Code is counting according to number of time series
exposed at /metrics endpoint.


Ref: 
https://prometheus.io/docs/concepts/metric_types/#histogram
https://prometheus.io/docs/concepts/metric_types/#summary


---

Please refer to the [CONTRIBUTING.md](../CONTRIBUTING.md) file for more
information on how to contribute to this project.

Signed-off-by: Alex Castilio dos Santos <[email protected]>
maps.Copy(props, cpuProps)
maps.Copy(props, t.profile.GetMemoryUsage())
t.TrackEvent("heartbeat", props)
}
func metricsCardinality() (int, error) {
metricFamilies, err := exporter.CombinedGatherer.Gather()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CombinedGatherer can be nil.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anubhabMajumdar addressed in #1293

case io_prometheus_client.MetricType_HISTOGRAM:
metrics := mf.GetMetric()
for _, m := range metrics {
metricscardinality += len(m.GetHistogram().GetBucket()) + 3 // +3 for le="+Inf", _sum and _count
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetHistogram can return nil.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anubhabMajumdar addressed in #1293

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants