Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Version 1.13+ Prometheus module does not export all Consul plug-in metrics. #7387

Open
ezombie opened this issue Apr 22, 2020 · 16 comments
Open
Labels
area/consul area/prometheus bug unexpected problem or unintended behavior

Comments

@ezombie
Copy link

ezombie commented Apr 22, 2020

After upgrade 1.12.6 to 1.13+ (1.14.1 affected too) i see this picture

Screenshot_2020-04-22_21-26-19

Grafana quiery:
sum by (service_name)(consul_health_checks_passing{service_name!=""})

telegraf --config telegraf.conf --config ./telegraf.d/basic_inputs.conf --config ./telegraf.d/consul.conf --test | grep a-B

> consul_health_checks,24fcfa57-97a5-48b4-870a-f3eac365c00d=24fcfa57-97a5-48b4-870a-f3eac365c00d,check_id=service:24fcfa57-97a5-48b4-870a-f3eac365c00d,host=db,xxx=xxx,node=wrk,service_name=a-B check_name="Service 'a-B' check",critical=0i,passing=1i,service_id="24fcfa57-97a5-48b4-870a-f3eac365c00d",status="passing",warning=0i 1587404134000000000
> consul_health_checks,3ac69a89-d337-41c1-8576-fbfae965ce5d=3ac69a89-d337-41c1-8576-fbfae965ce5d,check_id=service:3ac69a89-d337-41c1-8576-fbfae965ce5d,host=db,xxx=xxx,node=wrk,service_name=a-B check_name="Service 'a-B' check",critical=0i,passing=1i,service_id="3ac69a89-d337-41c1-8576-fbfae965ce5d",status="passing",warning=0i 1587404134000000000
> consul_health_checks,5aa02f28-abc4-4e4f-b9cd-5671407b0fb4=5aa02f28-abc4-4e4f-b9cd-5671407b0fb4,check_id=service:5aa02f28-abc4-4e4f-b9cd-5671407b0fb4,host=db,xxx=xxx,node=wrk,service_name=a-B check_name="Service 'a-B' check",critical=0i,passing=1i,service_id="5aa02f28-abc4-4e4f-b9cd-5671407b0fb4",status="passing",warning=0i 1587404134000000000
> consul_health_checks,92ba66ab-686a-433b-9e68-dcd9bd4beec9=92ba66ab-686a-433b-9e68-dcd9bd4beec9,check_id=service:92ba66ab-686a-433b-9e68-dcd9bd4beec9,host=db,xxx=xxx,node=wrk,service_name=a-B check_name="Service 'a-B' check",critical=0i,passing=1i,service_id="92ba66ab-686a-433b-9e68-dcd9bd4beec9",status="passing",warning=0i 1587404134000000000
> consul_health_checks,b2181ba8-6dd2-4b10-a3a2-41cd72f42379=b2181ba8-6dd2-4b10-a3a2-41cd72f42379,check_id=service:b2181ba8-6dd2-4b10-a3a2-41cd72f42379,host=db,xxx=xxx,node=wrk,service_name=a-B check_name="Service 'a-B' check",critical=0i,passing=1i,service_id="b2181ba8-6dd2-4b10-a3a2-41cd72f42379",status="passing",warning=0i 1587404134000000000
> consul_health_checks,be6e626f-bbd2-475b-b3fe-ad1550590eba=be6e626f-bbd2-475b-b3fe-ad1550590eba,check_id=service:be6e626f-bbd2-475b-b3fe-ad1550590eba,host=db,xxx=xxx,node=wrk,service_name=a-B check_name="Service 'a-B' check",critical=0i,passing=1i,service_id="be6e626f-bbd2-475b-b3fe-ad1550590eba",status="passing",warning=0i 1587404134000000000
> consul_health_checks,check_id=service:deeac742-113d-41f8-b899-97608b9550a4,deeac742-113d-41f8-b899-97608b9550a4=deeac742-113d-41f8-b899-97608b9550a4,host=db,xxx=xxx,node=wrk,service_name=a-B check_name="Service 'a-B' check",critical=0i,passing=1i,service_id="deeac742-113d-41f8-b899-97608b9550a4",status="passing",warning=0i 1587404134000000000
> consul_health_checks,check_id=service:ff26db26-a4ae-49d2-90cb-ce961aa3adc0,ff26db26-a4ae-49d2-90cb-ce961aa3adc0=ff26db26-a4ae-49d2-90cb-ce961aa3adc0,host=db,xxx=xxx,node=wrk,service_name=a-B check_name="Service 'a-B' check",critical=0i,passing=1i,service_id="ff26db26-a4ae-49d2-90cb-ce961aa3adc0",status="passing",warning=0i 1587404134000000000

curl http://127.0.0.1:9273/metrics | grep -i a-B

consul_health_checks_critical{ac69a89_d337_41c1_8576_fbfae965ce5d="3ac69a89-d337-41c1-8576-fbfae965ce5d",check_id="service:3ac69a89-d337-41c1-8576-fbfae965ce5d",dc="DC",env="prod",host="db",xxx="xxx",node="wrk",service_name="a-B"} 0
consul_health_checks_critical{aa02f28_abc4_4e4f_b9cd_5671407b0fb4="5aa02f28-abc4-4e4f-b9cd-5671407b0fb4",check_id="service:5aa02f28-abc4-4e4f-b9cd-5671407b0fb4",dc="DC",env="prod",host="db",xxx="xxx",node="wrk",service_name="a-B"} 0
consul_health_checks_critical{b2181ba8_6dd2_4b10_a3a2_41cd72f42379="b2181ba8-6dd2-4b10-a3a2-41cd72f42379",check_id="service:b2181ba8-6dd2-4b10-a3a2-41cd72f42379",dc="DC",env="prod",host="db",xxx="xxx",node="wrk",service_name="a-B"} 0
consul_health_checks_critical{be6e626f_bbd2_475b_b3fe_ad1550590eba="be6e626f-bbd2-475b-b3fe-ad1550590eba",check_id="service:be6e626f-bbd2-475b-b3fe-ad1550590eba",dc="DC",env="prod",host="db",xxx="xxx",node="wrk",service_name="a-B"} 0
consul_health_checks_critical{check_id="service:deeac742-113d-41f8-b899-97608b9550a4",dc="DC",deeac742_113d_41f8_b899_97608b9550a4="deeac742-113d-41f8-b899-97608b9550a4",env="prod",host="db",xxx="xxx",node="wrk",service_name="a-B"} 0
consul_health_checks_critical{check_id="service:ff26db26-a4ae-49d2-90cb-ce961aa3adc0",dc="DC",env="prod",ff26db26_a4ae_49d2_90cb_ce961aa3adc0="ff26db26-a4ae-49d2-90cb-ce961aa3adc0",host="db",xxx="xxx",node="wrk",service_name="a-B"} 0

consul_health_checks_passing{ac69a89_d337_41c1_8576_fbfae965ce5d="3ac69a89-d337-41c1-8576-fbfae965ce5d",check_id="service:3ac69a89-d337-41c1-8576-fbfae965ce5d",dc="DC",env="prod",host="db",xxx="xxx",node="wrk",service_name="a-B"} 1
consul_health_checks_passing{aa02f28_abc4_4e4f_b9cd_5671407b0fb4="5aa02f28-abc4-4e4f-b9cd-5671407b0fb4",check_id="service:5aa02f28-abc4-4e4f-b9cd-5671407b0fb4",dc="DC",env="prod",host="db",xxx="xxx",node="wrk",service_name="a-B"} 1
consul_health_checks_passing{b2181ba8_6dd2_4b10_a3a2_41cd72f42379="b2181ba8-6dd2-4b10-a3a2-41cd72f42379",check_id="service:b2181ba8-6dd2-4b10-a3a2-41cd72f42379",dc="DC",env="prod",host="db",xxx="xxx",node="wrk",service_name="a-B"} 1
consul_health_checks_passing{be6e626f_bbd2_475b_b3fe_ad1550590eba="be6e626f-bbd2-475b-b3fe-ad1550590eba",check_id="service:be6e626f-bbd2-475b-b3fe-ad1550590eba",dc="DC",env="prod",host="db",xxx="xxx",node="wrk",service_name="a-B"} 1
consul_health_checks_passing{check_id="service:deeac742-113d-41f8-b899-97608b9550a4",dc="DC",deeac742_113d_41f8_b899_97608b9550a4="deeac742-113d-41f8-b899-97608b9550a4",env="prod",host="db",xxx="xxx",node="wrk",service_name="a-B"} 1
consul_health_checks_passing{check_id="service:ff26db26-a4ae-49d2-90cb-ce961aa3adc0",dc="DC",env="prod",ff26db26_a4ae_49d2_90cb_ce961aa3adc0="ff26db26-a4ae-49d2-90cb-ce961aa3adc0",host="db",xxx="xxx",node="wrk",service_name="a-B"} 1

consul_health_checks_warning{ac69a89_d337_41c1_8576_fbfae965ce5d="3ac69a89-d337-41c1-8576-fbfae965ce5d",check_id="service:3ac69a89-d337-41c1-8576-fbfae965ce5d",dc="DC",env="prod",host="db",xxx="xxx",node="wrk",service_name="a-B"} 0
consul_health_checks_warning{aa02f28_abc4_4e4f_b9cd_5671407b0fb4="5aa02f28-abc4-4e4f-b9cd-5671407b0fb4",check_id="service:5aa02f28-abc4-4e4f-b9cd-5671407b0fb4",dc="DC",env="prod",host="db",xxx="xxx",node="wrk",service_name="a-B"} 0
consul_health_checks_warning{b2181ba8_6dd2_4b10_a3a2_41cd72f42379="b2181ba8-6dd2-4b10-a3a2-41cd72f42379",check_id="service:b2181ba8-6dd2-4b10-a3a2-41cd72f42379",dc="DC",env="prod",host="db",xxx="xxx",node="wrk",service_name="a-B"} 0
consul_health_checks_warning{be6e626f_bbd2_475b_b3fe_ad1550590eba="be6e626f-bbd2-475b-b3fe-ad1550590eba",check_id="service:be6e626f-bbd2-475b-b3fe-ad1550590eba",dc="DC",env="prod",host="db",xxx="xxx",node="wrk",service_name="a-B"} 0
consul_health_checks_warning{check_id="service:deeac742-113d-41f8-b899-97608b9550a4",dc="DC",deeac742_113d_41f8_b899_97608b9550a4="deeac742-113d-41f8-b899-97608b9550a4",env="prod",host="db",xxx="xxx",node="wrk",service_name="a-B"} 0
consul_health_checks_warning{check_id="service:ff26db26-a4ae-49d2-90cb-ce961aa3adc0",dc="DC",env="prod",ff26db26_a4ae_49d2_90cb_ce961aa3adc0="ff26db26-a4ae-49d2-90cb-ce961aa3adc0",host="db",xxx="xxx",node="wrk",service_name="a-B"} 0

System info:

Telegraf 1.12.6 and 1.13.0
Consul v1.6.2
Prometheus 2.15.2
Centos 7.8

Steps to reproduce:

upgrade 1.12.6 to 1.13+

Expected behavior:

Actual behavior:

Additional info:

@danielnelson
Copy link
Contributor

Can you add your prometheus_client output plugin configuration?

@danielnelson danielnelson added area/prometheus bug unexpected problem or unintended behavior labels Apr 22, 2020
@ezombie
Copy link
Author

ezombie commented Apr 23, 2020

cat /etc/telegraf/telegraf.d/prometheus.conf 
# Configuration for the Prometheus client to spawn
[[outputs.prometheus_client]]
  ## Address to listen on
  listen = "0.0.0.0:9273"
  expiration_interval = "10s"
  string_as_label = false
#  metric_version = 2

@danielnelson danielnelson added this to the 1.14.2 milestone Apr 23, 2020
@danielnelson danielnelson self-assigned this Apr 23, 2020
@danielnelson
Copy link
Contributor

Looking into this a bit closer, and the issue appears to be that labels starting with a 0-9 are illegal in Prometheus format and are rejected by the official library, in Telegraf 1.13 we updated the library and it has become more strict preventing these.

If you switch to metric_version = 2, it should output the metrics that don't have any labels starting with a number, but it will still drop those that do.

I think the best way forward is to adjust the consul input to avoid these types of tags. What if you disable the tag_delimiter option in the consul input?

@danielnelson danielnelson removed this from the 1.14.2 milestone Apr 23, 2020
@ezombie
Copy link
Author

ezombie commented Apr 30, 2020

cat consul.conf
on version 1.14.2 and the configuration file, the problem persists.

A possible solution would be to introduce an additional option into the consul module that will rename the metrics to a template that will be correct for the prometheus library.

[[inputs.consul]]
    interval = "10s"
    datacentre = "dc"
    address = "consul:8500"

@danielnelson
Copy link
Contributor

danielnelson commented Apr 30, 2020

Having UUIDs as the tagkey is not an ideal setup for any output, so I think we can come up with a better strategy for creating metrics. Can you show the output of telegraf --input-filter consul --test | grep a-B using the configuration without tag_delimiter?

@ezombie
Copy link
Author

ezombie commented Apr 30, 2020

2020-04-30T18:18:01Z I! Starting Telegraf 1.14.2
> consul_health_checks,2947540c-a0bb-4549-b76d-0b6188036b8a=2947540c-a0bb-4549-b76d-0b6188036b8a,check_id=service:2947540c-a0bb-4549-b76d-0b6188036b8a,host=XXX,n=n,node=XXX,service_name=YYY check_name="Service 'YYY' check",critical=0i,passing=1i,service_id="2947540c-a0bb-4549-b76d-0b6188036b8a",status="passing",warning=0i 1588270681000000000
> consul_health_checks,89427f45-2034-4c08-a12a-bb17baf0fb8d=89427f45-2034-4c08-a12a-bb17baf0fb8d,check_id=service:89427f45-2034-4c08-a12a-bb17baf0fb8d,host=XXX,n=n,node=XXX,service_name=YYY check_name="Service 'YYY' check",critical=0i,passing=1i,service_id="89427f45-2034-4c08-a12a-bb17baf0fb8d",status="passing",warning=0i 1588270681000000000
> consul_health_checks,9ea180bd-9bc1-4739-b3b0-7c9d479124b6=9ea180bd-9bc1-4739-b3b0-7c9d479124b6,check_id=service:9ea180bd-9bc1-4739-b3b0-7c9d479124b6,host=XXX,n=n,node=XXX,service_name=YYY check_name="Service 'YYY' check",critical=0i,passing=1i,service_id="9ea180bd-9bc1-4739-b3b0-7c9d479124b6",status="passing",warning=0i 1588270681000000000
> consul_health_checks,af96bbf7-eb1f-4282-8acb-dd3890e40d20=af96bbf7-eb1f-4282-8acb-dd3890e40d20,check_id=service:af96bbf7-eb1f-4282-8acb-dd3890e40d20,host=XXX,n=n,node=XXX,service_name=YYY check_name="Service 'YYY' check",critical=0i,passing=1i,service_id="af96bbf7-eb1f-4282-8acb-dd3890e40d20",status="passing",warning=0i 1588270681000000000
> consul_health_checks,bf0d639f-b667-43ba-8d55-11c444229b80=bf0d639f-b667-43ba-8d55-11c444229b80,check_id=service:bf0d639f-b667-43ba-8d55-11c444229b80,host=XXX,n=n,node=XXX,service_name=YYY check_name="Service 'YYY' check",critical=0i,passing=1i,service_id="bf0d639f-b667-43ba-8d55-11c444229b80",status="passing",warning=0i 1588270681000000000
> consul_health_checks,check_id=service:d1359ddb-462d-4efe-969f-4bd0032b0d31,d1359ddb-462d-4efe-969f-4bd0032b0d31=d1359ddb-462d-4efe-969f-4bd0032b0d31,host=XXX,n=n,node=XXX,service_name=YYY check_name="Service 'YYY' check",critical=0i,passing=1i,service_id="d1359ddb-462d-4efe-969f-4bd0032b0d31",status="passing",warning=0i 1588270681000000000
> consul_health_checks,check_id=service:e630c30b-2b28-49e5-895c-dcc1d3ac971e,e630c30b-2b28-49e5-895c-dcc1d3ac971e=e630c30b-2b28-49e5-895c-dcc1d3ac971e,host=XXX,n=n,node=XXX,service_name=YYY check_name="Service 'YYY' check",critical=0i,passing=1i,service_id="e630c30b-2b28-49e5-895c-dcc1d3ac971e",status="passing",warning=0i 1588270681000000000
> consul_health_checks,check_id=service:f32802d2-9c2b-4b7e-b3c3-067f3efc4dc6,f32802d2-9c2b-4b7e-b3c3-067f3efc4dc6=f32802d2-9c2b-4b7e-b3c3-067f3efc4dc6,host=XXX,n=n,node=XXX,service_name=YYY check_name="Service 'YYY' check",critical=0i,passing=1i,service_id="f32802d2-9c2b-4b7e-b3c3-067f3efc4dc6",status="passing",warning=0i 1588270681000000000
> consul_health_checks,check_id=service:f7e94e9c-2054-45c4-a362-46b78fafedd1,f7e94e9c-2054-45c4-a362-46b78fafedd1=f7e94e9c-2054-45c4-a362-46b78fafedd1,host=XXX,n=n,node=XXX,service_name=YYY check_name="Service 'YYY' check",critical=0i,passing=1i,service_id="f7e94e9c-2054-45c4-a362-46b78fafedd1",status="passing",warning=0i 1588270681000000000

@danielnelson
Copy link
Contributor

Can you run this query against the consul HTTP api in order to get the raw JSON for one of the check_id that produces a UUID tagkey:

curl -G http://consul:8500/v1/health/state/any --data-urlencode 'filter=CheckID == "service:2947540c-a0bb-4549-b76d-0b6188036b8a"'

@danielnelson
Copy link
Contributor

Quick follow-up, what I'm expecting to see is that you have ServiceTags like:

"ServiceTags": [
    "2947540c-a0bb-4549-b76d-0b6188036b8a"
],

I'm far from a Consul expert, so to me tags like this seem a bit odd. Can you tell me a bit about how you use this type of tag?

@ezombie
Copy link
Author

ezombie commented May 1, 2020

curl -G http://consul:8500/v1/health/state/any --data-urlencode 'filter=CheckID == "service:2947540c-a0bb-4549-b76d-0b6188036b8a"'
[{"Node":"XXX","CheckID":"service:2947540c-a0bb-4549-b76d-0b6188036b8a","Name":"Service 'YYY' check","Status":"passing","Notes":"","Output":"HTTP GET http://127.0.0.1:41615/health/?service=2947540c-a0bb-4549-b76d-0b6188036b8a: 200 OK Output: ","ServiceID":"2947540c-a0bb-4549-b76d-0b6188036b8a","ServiceName":"YYY","ServiceTags":["n","2947540c-a0bb-4549-b76d-0b6188036b8a"],"Type":"http","Definition":{},"CreateIndex":452885815,"ModifyIndex":452885837}]

@danielnelson
Copy link
Contributor

I think what will be best in your case is to exclude these tags. The information is contained in the check_id tag so adding the UUID is superfluous:

[[inputs.consul]]
  tagexclude = ["[!0-9]*"]

As a more general fix, perhaps we should add a new option that matches only ServiceTags, similar to how the docker plugin is structured:

[[inputs.consul]]
  service_tag_include = []
  service_tag_exclude = ["[0-9]*"]

@ekbfh
Copy link

ekbfh commented May 12, 2020

Hello!
Also have this problem.
My setup: i have consul and i put some uniq uuid in tags meta for each service.
Consul allows this operation with limits: Key can contain only ASCII chars and no special characters (A-Z a-z 0-9 _ and -). https://www.consul.io/docs/agent/services.html

But Prometheus can't take labels with first digit: Label names may contain ASCII letters, numbers, as well as underscores. They must match the regex [a-zA-Z_][a-zA-Z0-9_]* https://prometheus.io/docs/concepts/data_model/#metric-names-and-labels

Maybe should have an option, which shows if we wants to see tags as labels or not? Or any regex for including this tags, not all of them.

At this moment i see that valid consul meta configuration can affect on some metrics(!!) not even labels disappear.

Same theme was mentioned in several topics:
my issue with tags: #5522
PR where tags as labels was introduced: #4155

@danielnelson
Copy link
Contributor

@ekbfh What do you think about if we add the service_tag_include and service_tag_exclude options above?

@ekbfh
Copy link

ekbfh commented May 13, 2020

@danielnelson It might work, if you plan enable them by default. Cause as i say, i may have this naming in consul and cannot in prom.

Could you also add an option to choose what tags i want to gather?
For ex: gather_all_tags = true/false, cause without this i have bigger cardinality.

@danielnelson
Copy link
Contributor

You would be able to exclude all service tags with service_tag_exclude = ["*"].

We should also make sure that the prometheus output just removes tags that it cannot encode as labels, without removing the output.

@ekbfh
Copy link

ekbfh commented May 13, 2020

Yes, just removing tags is a good idea

@ekbfh
Copy link

ekbfh commented Jun 4, 2020

Any update?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/consul area/prometheus bug unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

3 participants