Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add prometheus serializer #5427

Closed
wants to merge 7 commits into from

Conversation

aixeshunter
Copy link

@aixeshunter aixeshunter commented Feb 14, 2019

related #4414

Required for all PRs:

  • Signed CLA.
  • Associated README.md updated.
  • Has appropriate unit tests.
    This PR addresses issue Add prometheus serializer #5350 by adding a Prometheus
    . The prometheus data format can be supported by file output.
[[outputs.file]]
  ## Files to write to, "stdout" is a specially handled file.
  files = ["stdout", "/tmp/metrics.out"]

  ## Data format to output.
  ## Each data format has its own unique set of configuration options, read
  ## more about them here:
  ## https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_OUTPUT.md
  data_format = "prometheus"

@danielnelson
Copy link
Contributor

@aixeshunter Is it possible to have the prometheus output to use this code as well?

@tomwilkie I know you are working around the code in the output, can you give an update on that? Hopefully we can avoid to many conflicts.

@aixeshunter
Copy link
Author

aixeshunter commented Feb 15, 2019

@danielnelson Hi, I think the prometheus output plugin is only designed for http method, but I need prometheus text data format in file output. Just like influxdb output and influx data format in file output.

@danielnelson
Copy link
Contributor

In the influxdb output we use the influx serializer, wouldn't something similar be possible with the prometheus_client output?

@pytimer
Copy link
Contributor

pytimer commented Feb 16, 2019

@danielnelson I think this feature wants to add new date format, this format can be used all outputs plugins.

@aixeshunter
Copy link
Author

aixeshunter commented Feb 18, 2019

I think this feature wants to add new date format, this format can be used all outputs plugins.

Almost this meaning:)

@brightzheng100
Copy link

This is a really useful feature that I'm looking forward to.
Can I suppose this should work with Pushgateway? Or more use cases while integrating with Prometheus?

And it's been half a year from the submission of this PR, any timeline to marge and release it?

@aixeshunter
Copy link
Author

This is a really useful feature that I'm looking forward to.
Can I suppose this should work with Pushgateway? Or more use cases while integrating with Prometheus?

And it's been half a year from the submission of this PR, any timeline to marge and release it?

@brightzheng100 Hi, I wrote a new agent to receive data from telegraf and push it to prometheus-pushgateway.

@brightzheng100
Copy link

Interesting. Is it open sourced? Where can I find it? @aixeshunter

@brightzheng100
Copy link

Anyway, while trying @aixeshunter's PR, it works perfectly fine for [[outputs.file]].

However, while trying [[outputs.http]], I kept getting errors: E! [agent] Error writing to output [http]: when writing to [http://10.197.81.43:9091/metrics/job/some_job/instance/cluster-01] received status code: 400

The data, while being exported to file system via [[outputs.file]], was successfully sent over to Pushgeteway by curl:

$ cat test.data
# HELP cpu_usage_idle Telegraf collected metric
# TYPE cpu_usage_idle gauge
cpu_usage_idle{host="Brights-MacBook-Pro.local",cpu="cpu-total"} 76.55586103474131
cpu_usage_idle{cpu="cpu3",host="Brights-MacBook-Pro.local"} 92.6
cpu_usage_idle{cpu="cpu1",host="Brights-MacBook-Pro.local"} 91.4
cpu_usage_idle{cpu="cpu4",host="Brights-MacBook-Pro.local"} 61.83816183816184
cpu_usage_idle{cpu="cpu2",host="Brights-MacBook-Pro.local"} 59.74025974025974
cpu_usage_idle{cpu="cpu0",host="Brights-MacBook-Pro.local"} 53.7
cpu_usage_idle{cpu="cpu7",host="Brights-MacBook-Pro.local"} 93.7
cpu_usage_idle{host="Brights-MacBook-Pro.local",cpu="cpu6"} 66.53346653346654

$ curl -v -X POST http://10.197.81.43:9091/metrics/job/some_job/instance/cluster-01 --data-binary "@test.data"
Note: Unnecessary use of -X or --request, POST is already inferred.
*   Trying 10.197.81.43...
* TCP_NODELAY set
* Connected to 10.197.81.43 (10.197.81.43) port 9091 (#0)
> POST /metrics/job/some_job/cluster/cluster-01 HTTP/1.1
> Host: 10.197.81.43:9091
> User-Agent: curl/7.54.0
> Accept: */*
> Content-Length: 653
> Content-Type: application/x-www-form-urlencoded
>
* upload completely sent off: 653 out of 653 bytes
< HTTP/1.1 202 Accepted
< Date: Fri, 26 Jul 2019 07:53:07 GMT
< Content-Length: 0
<
* Connection #0 to host 10.197.81.43 left intact

And this is the http output config:

[[outputs.http]]
  ## URL is the address to send metrics to
  url = "http://10.197.81.43:9091/metrics/job/some_job/instance/cluster-01"

  ## Timeout for HTTP message
  # timeout = "5s"

  ## HTTP method, one of: "POST" or "PUT"
  method = "POST"

  ## HTTP Basic Auth credentials
  # username = "username"
  # password = "pa$$word"

  ## OAuth2 Client Credentials Grant
  # client_id = "clientid"
  # client_secret = "secret"
  # token_url = "https://indentityprovider/oauth2/v1/token"
  # scopes = ["urn:opc:idm:__myscopes__"]

  ## Optional TLS Config
  # tls_ca = "/etc/telegraf/ca.pem"
  # tls_cert = "/etc/telegraf/cert.pem"
  # tls_key = "/etc/telegraf/key.pem"
  ## Use TLS but skip chain & host verification
  # insecure_skip_verify = false

  ## Data format to output.
  ## Each data format has it's own unique set of configuration options, read
  ## more about them here:
  ## https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_OUTPUT.md
  # data_format = "influx"
  data_format = "prometheus"

  ## Additional HTTP headers
  [outputs.http.headers]
  #   # Should be set manually to "application/json" for json data_format
    Content-Type = "application/x-www-form-urlencoded"

  ## HTTP Content-Encoding for write request body, can be set to "gzip" to
  ## compress body or "identity" to apply no encoding.
  # content_encoding = "identity"

Anything wrong?

@glinton
Copy link
Contributor

glinton commented Jul 26, 2019

@brightzheng100 Have you checked the server's logs for any reason why it might be a 400? perhaps it doesn't like a header telegraf adds or something?

@brightzheng100
Copy link

I did. Unfortunately, there were no logs about things around this issue.

And I did a tcpdump as you mentioned it might be caused by Telegraf header, as below.
But frankly, I didn't find anything special.

What do you think, @glinton @aixeshunter ?

$ sudo tcpdump -nnSX port 9091
...
12:19:50.348757 IP 10.255.210.21.50264 > 10.197.81.43.9091: Flags [.], seq 2053461952:2053463300, ack 2034768895, win 2064, options [nop,nop,TS val 36353426 ecr 1731987651], length 1348
	0x0000:  0250 4100 0101 0250 4100 0101 0800 4500  .PA....PA.....E.
	0x0010:  0578 0000 4000 4006 fc7b 0aff d215 0ac5  .x..@.@..{......
	0x0020:  512b c458 2383 7a65 57c0 7948 1bff 8010  Q+.X#.zeW.yH....
	0x0030:  0810 2c4a 0000 0101 080a 022a b592 673c  ..,J.......*..g<
	0x0040:  08c3 504f 5354 202f 6d65 7472 6963 732f  ..POST./metrics/
	0x0050:  6a6f 622f 736f 6d65 5f6a 6f62 2f69 6e73  job/some_job/ins
	0x0060:  7461 6e63 652f 636c 7573 7465 722d 3031  tance/cluster-01
	0x0070:  2048 5454 502f 312e 310d 0a48 6f73 743a  .HTTP/1.1..Host:
	0x0080:  2031 302e 3139 372e 3831 2e34 333a 3930  .10.197.81.43:90
	0x0090:  3931 0d0a 5573 6572 2d41 6765 6e74 3a20  91..User-Agent:.
	0x00a0:  5465 6c65 6772 6166 2f75 6e6b 6e6f 776e  Telegraf/unknown
	0x00b0:  0d0a 436f 6e74 656e 742d 4c65 6e67 7468  ..Content-Length
	0x00c0:  3a20 3338 3230 350d 0a43 6f6e 7465 6e74  :.38205..Content
	0x00d0:  2d54 7970 653a 2061 7070 6c69 6361 7469  -Type:.applicati
	0x00e0:  6f6e 2f78 2d77 7777 2d66 6f72 6d2d 7572  on/x-www-form-ur
	0x00f0:  6c65 6e63 6f64 6564 0d0a 4163 6365 7074  lencoded..Accept
	0x0100:  2d45 6e63 6f64 696e 673a 2067 7a69 700d  -Encoding:.gzip.
	0x0110:  0a0d 0a23 2048 454c 5020 6370 755f 7573  ...#.HELP.cpu_us
	0x0120:  6167 655f 6e69 6365 2054 656c 6567 7261  age_nice.Telegra
	0x0130:  6620 636f 6c6c 6563 7465 6420 6d65 7472  f.collected.metr
	0x0140:  6963 0a23 2054 5950 4520 6370 755f 7573  ic.#.TYPE.cpu_us
...

@brightzheng100
Copy link

Interestingly, I tried many times and even hacked into the source code by printing more info, like headers, body info etc.
And eventually I reduced the potential metrics, say by changing the percpu, in my test case, from true to false and boom, it started working.
So looks like there is a size constraint within http output plugin -- will dig further on this.

# Read metrics about cpu usage
[[inputs.cpu]]
  ## Whether to report per-cpu stats or not
  percpu = false
  ## Whether to report total system cpu stats or not
  totalcpu = true
  ## If true, collect raw CPU time metrics.
  collect_cpu_time = false
  ## If true, compute and report the sum of all non-idle CPU states.
  report_active = false

@danielnelson
Copy link
Contributor

I open a new pull request (#6703) with a prometheus serializer that shares the transformation code with the prometheus output. I'd like to use this when we develop other prometheus output protocols, such as push gateway and (hopefully) remote write protocol.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants