Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metrics framework #926

Merged
merged 20 commits into from
Nov 26, 2024
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 22 additions & 0 deletions common/metrics/config.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
package metrics

// Config provides configuration for a Metrics instance.
type Config struct {
// Namespace is the namespace for the metrics.
Namespace string

// HTTPPort is the port to serve metrics on.
HTTPPort int

// MetricsBlacklist is a list of metrics to blacklist. To determine the fully qualified metric name
// for this list, use the format "metricName:metricLabel" if the metric has a label, or just "metricLabel"
// if the metric does not have a label. Any fully qualified metric name that matches exactly with an entry
// in this list will be blacklisted (i.e. it will not be reported).
MetricsBlacklist []string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should metrics filtering be something that is taken care of by the application or the component that is collecting the metrics?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we were preparing for the (since abandoned) traffic generator, this seemed like a useful concept. I had added a large number of metrics, but there was concern that some of them were low bang for the buck (since it costs us $$$ to store them). We were planning on disabling a bunch of metrics with configuration changes, and turning them on in the future if we ever had an issue where they would be useful for debugging.

That being said, if people don't think this is a useful feature, it would be fairly straight forward to remove. What do others think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah all I'm saying is that it's probably possible to filter the metrics, logs in a similar manner with the grafana agent.

This is pretty useful now though because we haven't figure out how to do that in the grafana agent. Regardless, if we want to save more money we would need to learn how to do it in the grafana agent because we're not always running applications that allow metrics filtering in this manner.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I have removed the metrics blacklisting feature from this framework.


// MetricsFuzzyBlacklist is a list of metrics to blacklist. To determine the fully qualified metric name
// for this list, use the format "metricName:metricLabel" if the metric has a label, or just "metricLabel"
// if the metric does not have a label. Any fully qualified metric that contains one of these strings
// in any position to be blacklisted (i.e. it will not be reported).
MetricsFuzzyBlacklist []string
}
77 changes: 77 additions & 0 deletions common/metrics/count_metric.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
package metrics

import (
"github.com/prometheus/client_golang/prometheus"
)

var _ CountMetric = &countMetric{}

// countMetric a standard implementation of the CountMetric.
type countMetric struct {
Metric

// name is the name of the metric.
name string

// label is the label of the metric.
label string

// description is the description of the metric.
description string

// counter is the prometheus counter used to report this metric.
counter prometheus.Counter
}

// newCountMetric creates a new CountMetric instance.
func newCountMetric(name string, label string, description string, vec *prometheus.CounterVec) CountMetric {
var counter prometheus.Counter
if vec != nil {
counter = vec.WithLabelValues(label, "count")
}

return &countMetric{
name: name,
label: label,
description: description,
counter: counter,
}
}

func (m *countMetric) Name() string {
return m.name
}

func (m *countMetric) Label() string {
return m.label
}

func (m *countMetric) Unit() string {
return "count"
}

func (m *countMetric) Description() string {
return m.description
}

func (m *countMetric) Type() string {
return "counter"
}

func (m *countMetric) Enabled() bool {
return m.counter != nil
}

func (m *countMetric) Increment() {
if m.counter == nil {
return
}
m.counter.Inc()
}

func (m *countMetric) Add(value float64) {
if m.counter == nil {
return
}
m.counter.Add(value)
}
80 changes: 80 additions & 0 deletions common/metrics/gauge_metric.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
package metrics

import (
"github.com/prometheus/client_golang/prometheus"
)

var _ GaugeMetric = &gaugeMetric{}

// gaugeMetric is a standard implementation of the GaugeMetric interface via prometheus.
type gaugeMetric struct {
Metric

// name is the name of the metric.
name string

// label is the label of the metric.
label string

// unit is the unit of the metric.
unit string

// description is the description of the metric.
description string

// gauge is the prometheus gauge used to report this metric.
gauge prometheus.Gauge
}

// newGaugeMetric creates a new GaugeMetric instance.
func newGaugeMetric(
name string,
label string,
unit string,
description string,
vec *prometheus.GaugeVec) GaugeMetric {

var gauge prometheus.Gauge
if vec != nil {
gauge = vec.WithLabelValues(label, unit)
}

return &gaugeMetric{
name: name,
label: label,
unit: unit,
description: description,
gauge: gauge,
}
}

func (m *gaugeMetric) Name() string {
return m.name
}

func (m *gaugeMetric) Label() string {
return m.label
}

func (m *gaugeMetric) Unit() string {
return m.unit
}

func (m *gaugeMetric) Description() string {
return m.description
}

func (m *gaugeMetric) Type() string {
return "gauge"
}

func (m *gaugeMetric) Enabled() bool {
return m.gauge != nil
}

func (m *gaugeMetric) Set(value float64) {
if m.gauge == nil {
return
}
m.gauge.Set(value)
}
72 changes: 72 additions & 0 deletions common/metrics/latency_metric.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
package metrics

import (
"github.com/prometheus/client_golang/prometheus"
"time"
)

var _ LatencyMetric = &latencyMetric{}

// latencyMetric is a standard implementation of the LatencyMetric interface via prometheus.
type latencyMetric struct {
Metric

// name is the name of the metric.
name string

// label is the label of the metric.
label string

// description is the description of the metric.
description string

// observer is the prometheus observer used to report this metric.
observer prometheus.Observer
}

// newLatencyMetric creates a new LatencyMetric instance.
func newLatencyMetric(name string, label string, description string, vec *prometheus.SummaryVec) LatencyMetric {
var observer prometheus.Observer
if vec != nil {
observer = vec.WithLabelValues(label, "seconds")
}

return &latencyMetric{
name: name,
label: label,
description: description,
observer: observer,
}
}

func (m *latencyMetric) Name() string {
return m.name
}

func (m *latencyMetric) Label() string {
return m.label
}

func (m *latencyMetric) Unit() string {
return "seconds"
}

func (m *latencyMetric) Description() string {
return m.description
}

func (m *latencyMetric) Type() string {
return "latency"
}

func (m *latencyMetric) Enabled() bool {
return m.observer != nil
}

func (m *latencyMetric) ReportLatency(latency time.Duration) {
if m.observer == nil {
// this metric has been disabled
return
}
m.observer.Observe(latency.Seconds())
}
119 changes: 119 additions & 0 deletions common/metrics/metrics.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
package metrics

import "time"

// Metrics provides a convenient interface for reporting metrics.
type Metrics interface {
// Start starts the metrics server.
Start() error

// Stop stops the metrics server.
Stop() error

// GenerateMetricsDocumentation generates documentation for all currently registered metrics.
// Documentation is returned as a string in markdown format.
GenerateMetricsDocumentation() string

// WriteMetricsDocumentation writes documentation for all currently registered metrics to a file.
// Documentation is written in markdown format.
WriteMetricsDocumentation(fileName string) error

// NewLatencyMetric creates a new LatencyMetric instance. Useful for reporting the latency of an operation.
// Metric name and label may only contain alphanumeric characters and underscores.
NewLatencyMetric(
name string,
label string,
description string,
quantiles ...*Quantile) (LatencyMetric, error)

// NewCountMetric creates a new CountMetric instance. Useful for tracking the count of a type of event.
// Metric name and label may only contain alphanumeric characters and underscores.
NewCountMetric(
name string,
label string,
description string) (CountMetric, error)

// NewGaugeMetric creates a new GaugeMetric instance. Useful for reporting specific values.
// Metric name and label may only contain alphanumeric characters and underscores.
NewGaugeMetric(
name string,
label string,
unit string,
description string) (GaugeMetric, error)

// NewAutoGauge creates a new GaugeMetric instance that is automatically updated by the given source function.
// The function is polled at the given period. This produces a gauge type metric internally.
// Metric name and label may only contain alphanumeric characters and underscores.
NewAutoGauge(
name string,
label string,
unit string,
description string,
pollPeriod time.Duration,
source func() float64) error
}

// Metric represents a metric that can be reported.
type Metric interface {

// Name returns the name of the metric.
Name() string

// Label returns the label of the metric. Metrics without a label will return an empty string.
Label() string

// Unit returns the unit of the metric.
Unit() string

// Description returns the description of the metric. Should be a one or two sentence human-readable description.
Description() string

// Type returns the type of the metric.
Type() string

// Enabled returns true if the metric is enabled.
Enabled() bool
}

// GaugeMetric allows specific values to be reported.
type GaugeMetric interface {
Metric

// Set sets the value of a gauge metric.
Set(value float64)
}

// CountMetric allows the count of a type of event to be tracked.
type CountMetric interface {
Metric

// Increment increments the count by 1.
Increment()

// Add increments the count by the given value.
Add(value float64)
}

// Quantile describes a quantile of a latency metric that should be reported. For a description of how
// to interpret a quantile, see the prometheus documentation
// https://github.com/prometheus/client_golang/blob/v1.20.5/prometheus/summary.go#L126
type Quantile struct {
Quantile float64
Error float64
}

// NewQuantile creates a new Quantile instance. Error is set to 1% of the quantile.
func NewQuantile(quantile float64) *Quantile {
return &Quantile{
Quantile: quantile,
Error: quantile / 100.0,
}
}

// LatencyMetric allows the latency of an operation to be tracked. Similar to a gauge metric, but specialized for time.
type LatencyMetric interface {
Metric

// ReportLatency reports a latency value.
ReportLatency(latency time.Duration)
}
Loading
Loading