-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Datadog Metrics get multiplied by 10 for some unknown reason #10944
Comments
☝️ I have no idea why the bot says "Platform/windows" when this was Linux. |
Slight update. We've noticed that in Datadog, for the metrics in question, they are "rates" with an interval set to 10. If we set the interval to 1... the numbers look correct; however, this also divides the metrics collected by datadog's agent by 10, making them wrong. Ideally we should see the same values for both Telegraf and Datadog's agent. |
OK, more information. https://docs.datadoghq.com/metrics/dogstatsd_metrics_submission/#count
So what appears to be happening is that Datadog's agent is actually sending the data using the |
Sure enough: https://docs.datadoghq.com/api/latest/metrics/#submit-metrics There's a type field, which takes I suspect that if we send counts properly typed, OR if we send the value for interval (hard-coded to 1), or both, then the issue will go away. Working on testing this and will PR if it works. |
OK, we've figured out what we think is going on. The datadog agent/dogstatsd takes any If you've created a metric/custom metric as a Since telegraf's Datadog output plugin does not type the data, Datadog then interprets that data as being of the previously-created type -- and multiply everything by 10, assuming that the agent had previously divided everything by 10 on that end. If telegraf sends the typing and interval information, Datadog interprets the data correctly, as long as there are no other datadog/dogstatsd agents communicating with Datadog at the same time. |
Relevant telegraf.conf
Logs from Telegraf
System info
Telegraf 1.21.4, Docker image 1.21-alpine, AWS ECS
Docker
Dockerfile:
FROM telegraf:1.21-alpine
COPY dist/telegraf.conf /etc/telegraf/telegraf.conf
RUN apk add --no-cache
curl
python3
py3-pip
&& pip3 install --upgrade pip
&& pip3 install --no-cache-dir
awscli
&& rm -rf /var/cache/apk/*
RUN aws --version
COPY custom_entrypoint.sh /custom_entrypoint.sh
COPY parse_tags.py /parse_tags.py
ENTRYPOINT ["/custom_entrypoint.sh"]
CMD ["telegraf"]
parse_tags.py:
#!/usr/bin/env python
import sys
import json
file = sys.argv[1]
f = open(file)
data = json.load(f)
for i in data['Tags']:
if i['Key'] and i['Value']:
output = "export EC2_TAG_" + i['Key'].upper() + '=' + i['Value']
print(output.replace(':', '_'))
f.close()
custom_entrypoint.sh:
#!/bin/sh
set -e
output_file="/tmp/ec2_tags.json"
instance_id=$(curl http://169.254.169.254/latest/meta-data/instance-id)
aws ec2 describe-tags --filters "Name=resource-id,Values=$instance_id" --region=us-west-2 > $output_file
eval
python3 /parse_tags.py $output_file
exec /entrypoint.sh "$@"
Steps to reproduce
Expected behavior
If a counter is incremented 3 times and transmitted, Datadog should show a value of "3"
Actual behavior
If a counter is incremented 3 times and transmitted, Datadog instead shows a value of "30"
Additional info
No response
The text was updated successfully, but these errors were encountered: