Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure prometheus metrics have same set of labels #2857

Merged
merged 3 commits into from
Jun 14, 2017

Conversation

danielnelson
Copy link
Contributor

@danielnelson danielnelson commented May 26, 2017

It is not allowed for labels to differ within a metric, this patch
ensures that they are consistent within a single Write operation.
However, it does not address cases where the label varies across calls
to Write.

#2822

Required for all PRs:

  • Signed CLA.
  • Associated README.md updated.
  • Has appropriate unit tests.

@danielnelson
Copy link
Contributor Author

Not sure if this is a reasonable fix, as it could still be problematic in some cases. Would be great if some of the prometheus_client users to weigh in. @vlamug any advice?

If mixed tags are in a single call to write it will work:

// Write()
example value=45 1495760610353595441
example,foo=bar value=42 1495760617538126005
// example{host="loaner"} 42

If they are across calls, you will still get the original error:

// Write()
example value=42 1495760610353595441
// Write()
example,foo=bar value=42 1495760617538126005
// collected metric example label:<name:"host" value:"loaner" > untyped:<value:43 >  has label dimensions inconsistent with previously collected metrics in the same metric family

@danielnelson
Copy link
Contributor Author

@mmariani Can you test this as well and let me know if it fixes #2868? Here is an amd64 linux build

@mmariani
Copy link

mmariani commented Jun 1, 2017

Thanks. I have tried but had the same issue. Also tried cleaning up the prometheus data between runs.

I am not sure what you mean by "not allowed for labels to differ" - all the containers must have the same set of labels? That might be due to my environment, then

$ docker exec -ti 62f360fd1abf telegraf --version
Telegraf v26055d5 (git: fix-prometheus-output-labels 26055d5)
$ docker exec -ti 62f360fd1abf curl http://localhost:9126/metrics | head -5
An error has occurred during metrics collection:

774 error(s) occurred:
* collected metric docker_container_mem_cache label:<name:"build_date" value:"" > label:<name:"com_docker_stack_namespace" value:"wolphin" > label:<name:"com_docker_swarm_node_id" value:"qo1fnic35zxwgi3z2jucwvkag" > label:<name:"com_docker_swarm_service_id" value:"ko9vyw9vkklrr9rhu4cb7yseo" > label:<name:"com_docker_swarm_service_name" value:"wolphin_telegraf" > label:<name:"com_docker_swarm_task_id" value:"cznmv2rhfs19x1fggm1rao0g1" > label:<name:"com_docker_swarm_task_name" value:"wolphin_telegraf.qo1fnic35zxwgi3z2jucwvkag.cznmv2rhfs19x1fggm1rao0g1" > label:<name:"container_image" value:"mytelegraf" > label:<name:"container_name" value:"wolphin_telegraf.qo1fnic35zxwgi3z2jucwvkag.cznmv2rhfs19x1fggm1rao0g1" > label:<name:"container_version" value:"latest" > label:<name:"engine_host" value:"m1" > label:<name:"host" value:"62f360fd1abf" > label:<name:"license" value:"" > label:<name:"maintainer" value:"" > label:<name:"name" value:"" > label:<name:"vendor" value:"" > untyped:<value:45056 >  has label dimensions inconsistent with previously collected metrics in the same metric family
[...]

@danielnelson
Copy link
Contributor Author

@mmariani The new version of the prometheus library we are using enforces that all metrics have the same label names. We expire old metrics after the expiration_interval and they must not change during that period (60s default). With our docker input, each container could have a different set of labels which I assume is what is causing your problem.

You might be able to work around this by ensuring that the labels never change over time, maybe you can whitelist a smaller set of labels using taginclude/tagexclude? Just to test you could do:

[[inputs.docker]]
  # ... snip ...
  taginclude = ["host"]

@e271828-
Copy link

@danielnelson Confirm this works for me. Please merge.

@danielnelson
Copy link
Contributor Author

I've had some reports of this not being sufficient, so I'm working on a better fix.

It is not allowed for labels to differ within a metric, this patch
ensures that they are consistent within a single Write operation.
However, it does not address cases where the label varies across calls
to Write.
@danielnelson danielnelson force-pushed the fix-prometheus-output-labels branch from 26055d5 to a2c7cdd Compare June 13, 2017 01:05
@danielnelson
Copy link
Contributor Author

@e271828- @mmariani @sbadia @freesearcher

I rewrote the fix to handle additional edge cases. If you have time it would be great if everyone could test again, hoping to get this into a 1.3.2 release this week. Here is the amd64 linux build.

@mmariani
Copy link

My use case is metric collection across a swarm cluster. Some containers belong to services (and have labels like service_name), some don't. For now I could only make it work with docker_label_exclude = ["*"], same as setting taginclude I suppose.
Isn't there a way to set default empty values for a given list of tags?

I'm going to test this version now, thanks.

@mmariani
Copy link

The last version seems to work as I expect. Thanks a lot!

@mmariani
Copy link

Although the data is now exposed correctly on :9126, the label names contain dots and dashes (i.e. com.docker.swarm.node.id, com.docker.swarm.task.name, build-date), which is rejected by Prometheus. Is this a regression wrt https://github.com/influxdata/telegraf/pull/909/files or am I missing something?

Thanks

@danielnelson
Copy link
Contributor Author

Thanks for catching that! I updated the branch and here is the build.

@danielnelson danielnelson merged commit 949072e into master Jun 14, 2017
@danielnelson danielnelson deleted the fix-prometheus-output-labels branch June 14, 2017 01:04
danielnelson added a commit that referenced this pull request Jun 14, 2017
@mmariani
Copy link

that's perfect, thanks

@sbadia
Copy link

sbadia commented Jun 15, 2017

@danielnelson thanks!

jeichorn pushed a commit to jeichorn/telegraf that referenced this pull request Jul 24, 2017
nevins-b pushed a commit to nevins-b/telegraf that referenced this pull request Aug 23, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants