Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(outputs.datadog): Add support for submitting alongside dd-agent #15702

16 changes: 11 additions & 5 deletions plugins/outputs/datadog/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,14 @@ See the [CONFIGURATION.md][CONFIGURATION.md] for more details.
## Override the default (none) compression used to send data.
## Supports: "zlib", "none"
# compression = "none"

## Convert counts to rates
## Use this to be able to submit metrics from telegraf alongside Datadog agent
# should_rate_counts = true

## When should_rate_counts is enabled, this overrides the
## default (10s) rate interval used to divide count metrics by
# rate_interval = 20
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example config and readme should show the default values. In this case false and 10s. That way you don't have to document the default in the help text.

I do think we should get rid of should_rate_counts, and use a non-zero rate_interval, or like we do in stackdriver output, we have a metric_counter = [] which means any metric matching that name will be marked as a counter. Should we do the same here and mark any metric matching that name as a rate?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If not getting rid of it, I would rename it to convert_count_to_rate or something. Using "should" in a config setting looks weird to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Appreciate the feedback.

we have a metric_counter = [] which means any metric matching that name will be marked as a counter. Should we do the same here and mark any metric matching that name as a rate?

It's a good call out, but that would not scale in larger deployments such as the one I am working with that has 1000s of metrics.

I am happy with exploring having rate_interval default to 0, and if it is > 0 then enable conversion of counts to rates. My concern would be if users would know to set it to 10 - which is the default rate interval used by the Datadog agent - I can explicitly state this in the README though.

If it were to stay then I have no objections to changing should_rate_counts to convert_count_to_rate.

Let me know your thoughts 👍🏻

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd still like to see the single option and you can document the suggested interval would be 10 seconds as it is default used by Datadog. Sound good?

Copy link
Contributor Author

@jdheyburn jdheyburn Aug 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM - I've made the change to use single option in c808463. I also added a test in there for the status quo when rate_interval=0 👍🏻

```

## Metrics
Expand All @@ -46,11 +54,9 @@ field key with a `.` character.
Field values are converted to floating point numbers. Strings and floats that
cannot be sent over JSON, namely NaN and Inf, are ignored.

We do not send `Rate` types. Counts are sent as `count`, with an
interval hard-coded to 1. Note that this behavior does *not* play
super-well if running simultaneously with current Datadog agents; they
will attempt to change to `Rate` with `interval=10`. We prefer this
method, however, as it reflects the raw data more accurately.
Enabling the `should_rate_counts` will convert `count` metrics to `rate`
and divide it by the `rate_interval`. This will allow telegraf to run
alongside current Datadog agents.

[metrics]: https://docs.datadoghq.com/api/v1/metrics/#submit-metrics
[apikey]: https://app.datadoghq.com/account/settings#api
57 changes: 45 additions & 12 deletions plugins/outputs/datadog/datadog.go
Original file line number Diff line number Diff line change
Expand Up @@ -25,11 +25,13 @@ import (
var sampleConfig string

type Datadog struct {
Apikey string `toml:"apikey"`
Timeout config.Duration `toml:"timeout"`
URL string `toml:"url"`
Compression string `toml:"compression"`
Log telegraf.Logger `toml:"-"`
Apikey string `toml:"apikey"`
Timeout config.Duration `toml:"timeout"`
URL string `toml:"url"`
Compression string `toml:"compression"`
ShouldRateCounts bool `toml:"should_rate_counts"`
RateInterval int64 `toml:"rate_interval"`
Log telegraf.Logger `toml:"-"`

client *http.Client
proxy.HTTPProxy
Expand Down Expand Up @@ -75,15 +77,15 @@ func (d *Datadog) Connect() error {
return nil
}

func (d *Datadog) Write(metrics []telegraf.Metric) error {
ts := TimeSeries{}
func (d *Datadog) convertToDatadogMetric(metrics []telegraf.Metric) ([]*Metric, int) {
tempSeries := []*Metric{}
metricCounter := 0

for _, m := range metrics {
if dogMs, err := buildMetrics(m); err == nil {
metricTags := buildTags(m.TagList())
host, _ := m.GetTag("host")
metricType, _ := m.GetTag("metric_type")

if len(dogMs) == 0 {
continue
Expand All @@ -99,9 +101,19 @@ func (d *Datadog) Write(metrics []telegraf.Metric) error {
dname = m.Name() + "." + fieldName
}
var tname string
var interval int64
interval = 1
switch m.Type() {
case telegraf.Counter:
tname = "count"
case telegraf.Counter, telegraf.Untyped:
if d.ShouldRateCounts && isRateable(metricType, fieldName) {
interval = d.RateInterval
dogM[1] = dogM[1] / float64(interval)
tname = "rate"
} else if m.Type() == telegraf.Counter {
tname = "count"
} else {
tname = ""
}
case telegraf.Gauge:
tname = "gauge"
default:
Expand All @@ -112,7 +124,7 @@ func (d *Datadog) Write(metrics []telegraf.Metric) error {
Tags: metricTags,
Host: host,
Type: tname,
Interval: 1,
Interval: interval,
}
metric.Points[0] = dogM
tempSeries = append(tempSeries, metric)
Expand All @@ -122,6 +134,12 @@ func (d *Datadog) Write(metrics []telegraf.Metric) error {
d.Log.Infof("Unable to build Metric for %s due to error '%v', skipping", m.Name(), err)
}
}
return tempSeries, metricCounter
}

func (d *Datadog) Write(metrics []telegraf.Metric) error {
ts := TimeSeries{}
tempSeries, metricCounter := d.convertToDatadogMetric(metrics)

if len(tempSeries) == 0 {
return nil
Expand Down Expand Up @@ -220,6 +238,20 @@ func verifyValue(v interface{}) bool {
return true
}

func isRateable(metricType string, fieldName string) bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have the use of "metricType" in a couple places, can you add a comment to this function that explains or specifies that we are looking for metric type from statsd tag for specific cases?

Copy link
Contributor Author

@jdheyburn jdheyburn Aug 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think explicitly naming the variable statsDMetricType would help too.

// Retrieve the metric_type tag created by inputs.statsd
statsDMetricType, _ := m.GetTag("metric_type")
// ...
if d.ShouldRateCounts && isRateable(statsDMetricType, fieldName) {
// ...

func isRateable(statsDMetricType string, fieldName string) bool {
	switch statsDMetricType {
	case
		"counter":
		return true
	case
		"timing",
		"histogram":
		return fieldName == "count"
	default:
		return false
	}
}

I can then maybe put some better documentation in the README to explicitly say only inputs.statsd is supported. How does that sound?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks great and makes it much clearer, thank you!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in c808463

switch metricType {
case
"counter":
return true
case
"timing",
"histogram":
return fieldName == "count"
default:
return false
}
}

func (p *Point) setValue(v interface{}) error {
switch d := v.(type) {
case int64:
Expand All @@ -246,8 +278,9 @@ func (d *Datadog) Close() error {
func init() {
outputs.Add("datadog", func() telegraf.Output {
return &Datadog{
URL: datadogAPI,
Compression: "none",
URL: datadogAPI,
Compression: "none",
RateInterval: 10,
}
})
}
Loading
Loading