Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UDP Listener - Systemd Service Restart Failure #1643

Closed
goakley opened this issue Aug 17, 2016 · 2 comments
Closed

UDP Listener - Systemd Service Restart Failure #1643

goakley opened this issue Aug 17, 2016 · 2 comments

Comments

@goakley
Copy link

goakley commented Aug 17, 2016

Bug report

Relevant telegraf.conf:

[agent]
  interval = "10s"
  round_interval = true
  metric_buffer_limit = 1000
  flush_buffer_when_full = true
  collection_jitter = "0s"
  flush_interval = "10s"
  flush_jitter = "0s"
  debug = false
  quiet = false
  hostname = ""
  omit_hostname = true

[[outputs.influxdb]]
  urls = ["https://REDACTED:8086"]
  database = "REDACTED"
  retention_policy = "default"
  precision = "us"
  timeout = "8s"
  username = "REDACTED"
  password = "REDACTED"
  user_agent = "telegraf"

[[inputs.udp_listener]]
  service_address = ":8092"
  allowed_pending_messages = 10000
  udp_packet_size = 1500
  data_format = "influx"

System info:

Telegraf - version 0.13.1
Linux 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt25-1 (2016-03-06) x86_64 GNU/Linux

Steps to reproduce:

  1. systemctl start telegraf
  2. Wait until telegraf has fully started (is listening for UDP traffic, is flushing data)
  3. systemctl restart telegraf

Expected behavior:

Telegraf should release the UDP port for re-use before dying. When the next version starts, it should grab the UDP port and start listening for traffic.

Actual behavior:

The UDP port is unable to be grabbed by the new telegraf instance, causing it to die and restart repeatedly until systemd gives up trying to start it.

$ systemctl status telegraf
Aug 17 17:40:48 development-monitoring-influxdb-01234567 systemd[1]: Stopping The plugin-driven server agent for reporting metrics into InfluxDB [website]...
Aug 17 17:40:48 development-monitoring-influxdb-01234567 systemd[1]: Starting The plugin-driven server agent for reporting metrics into InfluxDB [website]...
Aug 17 17:40:48 development-monitoring-influxdb-01234567 systemd[1]: telegraf-website.service start request repeated too quickly, refusing to start.
Aug 17 17:40:48 development-monitoring-influxdb-01234567 systemd[1]: Failed to start The plugin-driven server agent for reporting metrics into InfluxDB [website].
Aug 17 17:40:48 development-monitoring-influxdb-01234567 systemd[1]: Unit telegraf-website.service entered failed state.
$ tail telegraf.log
2016/08/17 17:40:47 Starting Telegraf (version 0.13.1)
2016/08/17 17:40:47 Loaded outputs: influxdb
2016/08/17 17:40:47 Loaded inputs: udp_listener
2016/08/17 17:40:47 Tags enabled: 
2016/08/17 17:40:47 Agent Config: Interval:10s, Debug:false, Quiet:false, Hostname:"", Flush Interval:10s 
2016/08/17 17:40:47 Started UDP listener service on :8092
2016/08/17 17:40:47 ERROR: ListenUDP - listen udp :8092: bind: address already in use
2016/08/17 17:40:47 Starting Telegraf (version 0.13.1)
2016/08/17 17:40:47 Loaded outputs: influxdb
2016/08/17 17:40:47 Loaded inputs: udp_listener
2016/08/17 17:40:47 Tags enabled: 
2016/08/17 17:40:47 Agent Config: Interval:10s, Debug:false, Quiet:false, Hostname:"", Flush Interval:10s 
2016/08/17 17:40:47 Started UDP listener service on :8092
2016/08/17 17:40:47 ERROR: ListenUDP - listen udp :8092: bind: address already in use
2016/08/17 17:40:47 Starting Telegraf (version 0.13.1)
2016/08/17 17:40:47 Loaded outputs: influxdb
2016/08/17 17:40:47 Loaded inputs: udp_listener
2016/08/17 17:40:47 Tags enabled: 
2016/08/17 17:40:47 Agent Config: Interval:10s, Debug:false, Quiet:false, Hostname:"", Flush Interval:10s 
2016/08/17 17:40:47 Started UDP listener service on :8092
2016/08/17 17:40:47 ERROR: ListenUDP - listen udp :8092: bind: address already in use
2016/08/17 17:40:47 Starting Telegraf (version 0.13.1)
2016/08/17 17:40:47 Loaded outputs: influxdb
2016/08/17 17:40:47 Loaded inputs: udp_listener
2016/08/17 17:40:47 Tags enabled: 
2016/08/17 17:40:47 Agent Config: Interval:10s, Debug:false, Quiet:false, Hostname:"", Flush Interval:10s 
2016/08/17 17:40:47 Started UDP listener service on :8092
2016/08/17 17:40:47 ERROR: ListenUDP - listen udp :8092: bind: address already in use
2016/08/17 17:40:48 Starting Telegraf (version 0.13.1)
2016/08/17 17:40:48 Loaded outputs: influxdb
2016/08/17 17:40:48 Loaded inputs: udp_listener
2016/08/17 17:40:48 Tags enabled: 
2016/08/17 17:40:48 Agent Config: Interval:10s, Debug:false, Quiet:false, Hostname:"", Flush Interval:10s 
2016/08/17 17:40:48 Started UDP listener service on :8092
2016/08/17 17:40:48 ERROR: ListenUDP - listen udp :8092: bind: address already in use

Additional Info:

On top of this, there is sometimes STILL a telegraf process running, even though systemd says the process has failed to start. It is somehow detached from systemd, but the logs and netstat say that everything is running fine.

$ ps aux | grep telegraf
telegraf 26003  0.0  0.2 154812 24432 ?        Sl   17:38   0:00 /usr/bin/telegraf -config /etc/telegraf/telegraf.conf -config-directory /etc/telegraf/telegraf.d
glen     28791  0.0  0.0  15028  1984 pts/2    S+   17:53   0:00 grep telegraf
$ sudo netstat -tulpn | grep 8092
udp6       0      0 :::8092                 :::*                                26003/telegraf
@sparrc
Copy link
Contributor

sparrc commented Aug 17, 2016

this is a known systemd issue, it is fixed in 1.0.0-beta3

@sparrc sparrc closed this as completed Aug 17, 2016
@heathermary89
Copy link

Check /var/log/messages for detailed info on the error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants